Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Deep Learning Python vs. Java

October 8, 2019 by Thomas Henson Leave a Comment

What About Java in Deep Learning?

Years ago when I left Java in the rear view of my career, I never imagined someone would ask me if they could use Java over Python. Just kidding Java you know it’s only a joke and you will always have a special place in my heart. A place in my heart that probably won’t run because I have the wrong version of the JDK installed. 

Python is king of the Machine Learning (ML) and Deep Learning (DL) workflow. Most of the popular ML libraries are in Python but are there Java offerings? How about in Deep Learning can you use Java? The answer is yes you can! Find the differences between Machine Learning and Deep Learning libraries in Java and Python in the video.

Transcript

Hi, folks. Thomas Henson here, with thomashenson.com, and today is another episode of Big Data Big Questions. Today’s question comes in around deep learning frameworks in Java, not Python. So, find out about how you can use Java instead of Python for deep learning frameworks. We’ve talked about it here on this channel, around using neural networks and being able to train models, but let’s find out what we can do with Java in deep learning.

Today’s episode comes in and we’re talking about deep learning frameworks that use Java, not Python. So, today the question is, “Are there specific deep learning frameworks that use Java, not Python?” First off, let’s talk a little bit about deep learning, do a recap. Deep learning, if you remember, is the use of neural networks whenever we’re trying to solve a problem. We see it a lot in multimedia, right, like, we see image detection. Does this image contain a cat or not contain a cat?

The deep learning approach is to take those images [Inaudible 00:01:10] you know, if we’re talking about supervised, so take those labeled images, so of a cat, not of a cat, feed those into your neural network, and let it decide what those features are. At the end you get a model that’s going to tell you, is this a cat or is this not a cat? Within some confidence. Hopefully not 50%, maybe closer to 99 or 97. But, that’s the deep learning approach versus the machine learning approach that we’ve seen a good bit.

We talk about Hadoop and traditional analytics from that perspective is in machine learning we’re probably going to use some kind of algorithm like singular value decomposition, or PCI, and we’re going to take these images and we’re going to look at each one and we’re going to define each feature, from the cat’s ears to the cat’s nose, and we’re going to feed that through the model and it’s going to give us some kind of confidence. While the deep learning approach we get to use a neural network, it defines some of those features, helps us out a lot. It’s not magic, but it is a little bit, so really, really innovative approach.

So, the popular languages, and what we’ve talked most about on this channel and probably other channels and most of the examples you’ve seen are all around Python, right? I did do a video before where I was wrong on C++. There was more C++ in deep learning than I really originally thought. You can check that video out, where we kind of go through and talk about that and I come in and say, “Hey, sorry. I missed the boat on that one.” But, the most popular language, one… I mean, I did a Pluralsight video on it, Take CTRL of Your Career, around TensorFlow and using TFLearn. TensorFlow is probably far and away the most popular one. You’ve seen it with stats that are out there. Also PyTorch, Caffe2, MXNet, and then some other, higher-level languages where Keras is able to use some of TensorFlow and be a higher-level abstraction, but most of those are going to use Python and then some of them have C++. Most examples that you’re going to see out there, just from my experience and just working in the community, is Python. Most people are looking for those Python examples.

But, on this channel, we’ve talked a lot about options and Hadoop for non-Java developers, but this is an opportunity where all you Java developers out there, you’re looking for, “Hey, we want to get into the deep learning framework. We don’t want to have to code everything ourselves. Are there some things that we can attach onto?” And the answer is yes, there are. It’s not as popular as Python right now, or R and C++ in the deep learning frameworks, but there is a framework called Deeplearning4j that is a Java-based framework. The Java-based framework is going to allow for you to use Java. You could still use Python, though. Even with the framework, you can abstract away and do Python, but if you’re specifically a Java developer and looking to… I mean, maybe you want to get in and contribute to the Deeplearning4j community and be able to take it from that perspective, or you’re just wanting to be able to implement it in some projects. Maybe you’re like, “Hey, you know what? I’m a Java developer. I want to continue doing Java.” Java’s been around since ’95, right? So, you want to jump into that? Then Deeplearning4j is the one for you.

So, really, maybe think about why would you want to use a Java-based deep learning framework, for people that maybe aren’t familiar with Java or don’t have it. One of the things is it claims to be a little bit more efficient, so it’s going to be more efficient than using an abstraction layer from that perspective in Python. But also, there’s a ton of Java developers out there, you know, there’s a community. Talked about how it’s been around since ’95, so there’s an opportunity out there to tap into a lot of developers that have the skills to be able to use it and so, there’s a growing need, right? There’s communities all around the globe and different little subsets and little subareas. Java’s one of those.

I mean, if you look at what we did from a Hadoop perspective, so many people that were Java developers moved to that community, also a lot of people that didn’t really do Java. It’s a lot like, like I said, at the point I was at in my career, I was more of a .NET C# developer. Fast forward to getting into the Hadoop community, went back to my roots as a Java, so I’d done some Java in the past, and went through that phase. And so, for somebody like me, maybe I would want to go back out. I don’t know. I’ve kind of gone through more Python, but a lot of different options out there. Just being able to give Java developers a platform to be able to get involved in deep learning, like, deep learning is very popular.

So, those are some of the reasons that you might want to go, but the question is, when you think about it, so if I’m not a Java developer, or what would you recommend? Would you recommend maybe not learn TensorFlow and go into Deeplearning4j? You know, I think that one’s going to depend… I mean, we say it a lot in here. It’s going to depend on what you’re using in your organization and what your skill set is. If you’re mostly a Python person, my recommendation would be continue on or jump into the TensorFlow area. But if you’re working on a project that is using Deeplearning4j then by all means go down that path and learn more about it. If you’re a Java developer and you want to get into it, you don’t want to transition skills or you’re just looking to be able to test something out and play with it, and you don’t want to have to write it in Python, you want to be able to do it in Java, yeah, use that.

These are all just tools. We’re not going to get transfixed on any tool. We’re not going to go all in and say, “You know what? I’m only going to be a Java developer,” or, “I’m only going to be this.” We’re going to be able to transition our skills and there’s always going to be options out there to do it. And in these frameworks too, right? Deeplearning4j is awesome, but maybe there’s another one that’s coming up that people would want to jump into, so like I said, don’t get so transfixed with certain frameworks. Like, Hadoop was awesome. We broke it apart. A lot of people navigated to Spark and still use HDFS as a base. There’s always kind of skills that you can go to, but if you go in and say, “Hey, I’m only going to ever do MapReduce and it’s always going to be in Java,” then you’re going to have some challenges throughout your career. That’s not just in data engineering, that’s throughout all IT. Heck, probably throughout all careers. Just be able to be flexible for it.

So, if you’re a Java developer, if you’re looking to test some things out, definitely jump into it. If you don’t have any Java skills and it’s not something that you’re particularly wanting to do, then I don’t recommend you running in and trying to learn Java just for this. If you’re doing Python, steady on with TensorFlow, or PyTorch, or Caffe, whatever you’re using.

So, until next time. See you again on Big Data Big Questions. Make sure you subscribe and ring that bell so you never miss an episode. If you have any questions, put them in the comment section here below. Thanks again.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Deep Learning Tagged With: Deep Learning, Java, Python, Tensorflow

5 Types of Buckets in Splunk

October 7, 2019 by Thomas Henson Leave a Comment

Types of Buckets in Splunk

Where does data go once ingested into Splunk? 

Does Splunk use files and folders?

How Splunk Stores Data

In Splunk data is stored into buckets. Not real bucket filled with water but buckets filled with data. A bucket in Splunk is basically a directory for data and index files. In a Splunk deployment there are going to be many buckets that are arranged by time. In this video learn the 5 types of buckets in Splunk every administrator should understand.

Transcript – 5 Types of Buckers in Splunk

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s we’re going to be talking about the five different kind of buckets in Splunk. We’re going to go through, we’re going to talk about how Splunk uses buckets, and how it’s used to be able to store your data, and how to know which bucket your data is in. Find out more about the different buckets in Splunk right after this.

Today, we’re going to be going through the five different buckets in Splunk, and we’re going to be talking about that. If you have a question, remember, throw it in the comment section here below. Find me on Twitter, on Instagram, and I’ll do my best to answer those here on the show. Today, I wanted to go through the different buckets and how those are used in Splunk. Before we jump in and talk about those five different buckets, let’s just get a quick definition of how Splunk works with storing our data. Think about our data coming in, to our Splunk environment. The first thing that’s going to happen is, whether we’re uploading it, or whether it’s live streaming data, it’s going to be indexed. That index is going to help us, one, be able to search it a little bit better. Splunk’s going to put a timestamp on it, and it’s going to do some other things to give us some meta data, so that we can simply search through that data a lot quicker in our Splunk environment.

The other thing it’s going to do is, it’s going to store that data so we can find it. It’s going to store those in different buckets. Think of buckets just as the Splunk file system. Just like you have a file system, think of it in a Windows environment. I’ve got directories and subdirectories. Think of it in the Splunk environment as you have different buckets. I have buckets for different portions of my data. Those are all going to be with a timestamp. Right? Right. As it indexes, that’s how Splunk decides where they’re going to be in the bucket, and also, there’s some other things you can do to decide how long data’s going to sit, and sit in each one of your buckets, but before we jump in and talk about that, let’s make sure we understand what those buckets are.

The first bucket, or really the first two buckets, are going to be your hot and your warm bucket. Your hot and your warm bucket, this is where your most recent data is going to be. Your hot and your warm bucket, they’re both going to sit in the same specific area. They’re going to be put in there so that you have your data the most current, right? This is where Splunk really puts a lot of performance characteristics around where this data should live in these hot and warm buckets, and specifically really, if you think about it, your hot and your warm bucket, your warm bucket’s going to contain some of your most recent data, but your hot bucket is going to be the one that’s riding to the new data. As you set up your policies for how long your data is going to exist in your Splunk buckets, your hot and your warm bucket, let’s say that arbitrarily you can get 10 events. It’s a little bit more complicated than that, but let’s just make it simple. Say that you can get 10 events in your buckets. Every time you get to 10 events, your hot bucket is going to become a warm bucket, and you have a brand-new bucket. Think of your hot bucket as where the newest files go, and your warm bucket is where your more recent is. It gives you a life cycle policy.

The third kind of bucket is a cold bucket. This is where our more older data, where our data is kind of aging off. This data can actually live, doesn’t have to live with a hot and warm bucket. It can go to maybe a NAS device or some kind of object store where you can actually search on it. It still needs to have some requirements for how it’s being searched, because it could be, if we’re saying that we have 10 events in each one of our warm buckets, let’s say that after a week, those 10 events age off to our cold bucket. Our cold bucket could hold from a week to maybe three months on our policy. That’s where our data is going to exist, in that cold bucket. No new data’s being written to it, and there’s not as much performance requirements just because, probably not searching on it as frequently as we are on the newer events. They’re pulling out for our dashboard, then are stored in our hot and our warm buckets.

Then, we have our fourth bucket, which is going to be our frozen bucket. Think of this as really old, frozen data, hence the frozen bucket. Data that we’re holding onto for compliance reasons or we just want to be able to go back and search on it at some point in the past, but this data is actually going off to some kind of long-term retention. The thing about it is, we want to search on it, I’ll talk about it in just a second, but this data is not searchable in its current form of a frozen bucket. There’s another process to be able to do that, and that’ll include another bucket, but think of this as where you’re aging off your data. This gives you an opportunity to get a better cost per terabyte for how you’re storing the data, and get it out of your Splunk search, so better performance on your Splunk search as well, but still being able to hold on to that data, but think of this as, this is, if we’re saying three months is what we’re going to hold in our cold bucket. Think of it being more than three months that’s going to exist in that frozen bucket.

In our last bucket, number five just talked about it, it’s a thawed bucket. Our thawed bucket is how we get that frozen data back into a searchable state. You can go through and being able to thaw that bucket out. Think of it as taking some of the compression out of it, but also putting it in a better place to be able to store it. We talked about performance, and some of the other characteristics that you need to be able to search your data. In those thawed buckets is where you can start and go from that process. It’s a full life cycle process. Go from hot, to warm, to cold, to frozen, and then when we want to see your data again, put it in that thawed bucket. That’s all we have today. I hope you enjoyed this episode, where we talked about the five different kind of buckets in Splunk. If you have any questions, make sure you put them in the comments section here below. Reach out to me on Big Data Big Questions, and I’ll do my best to answer your questions right here.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

 

Filed Under: Splunk Tagged With: Splunk, Splunk Buckets

Book Review Living With A Seal

October 6, 2019 by Thomas Henson Leave a Comment

Book Review Living With A Seal

31 Days Training with the Toughest Man on the Planet

Yet another great book review! Living with a Seal was a fun read about a 31 day period where Jesse Itzler hires a Navy Seal to live with him. During this 31 day period Jesse is put to the test both mentally and physically. Jesse wanted to take on this challenge to whip himself back into shape. The events throughout this book make for great entertainment and inspiration. After reading this book it definitely gave me some ideas of how to push myself in different areas of my life. Watch the video to learn my thought on Living with a Seal.

Transcript – Living With A Seal Book Review

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of… Book Club? I don’t know. Still don’t have a name for this, but today I’m going to be reviewing Living with a Seal. Awesome book. Let’s find out all about that right after this.

Today, in this episode where I’m reviewing a book, comma, I am on a mission. If you’ve watched my goals 2019, heard me talking about how many books I want to try to read. Missed my goal last year. We’re chugging along this year, trying to get to the halfway point. The books that I’m reviewing today, really good, was actually referenced and referred to me my Erin Banks. You’ve seen me do some videos with her with the Big Data Beard team, where we went through in machine learning course. Her and I have talked about some certifications.

The book that she recommended was Living with a Seal: 31 Days of Training with the Toughest Man on the Planet. I will say, throughout the whole book, you never know who that person is, but I know who it is. If you Google around you can find out. The real premise of the story was, it was not written as a book. I think it started off as a blog. Jesse Itzler, he’s really famous and entrepreneurial. He’s been involved with Zico water and NetJets [Phonetic 00:01:23]. When he wrote this book, was when he was really going through the big push for Zico water. You get to see a little bit of behind-the-scene what was going on from a business perspective. It’s not a business book per se. He’s an ultra-runner, he’s an adventurer. He was actually on MTV, so I think he was a musician at some point. I guess when I was young, or maybe a little bit older. Either way, the premise of the book is, he’s in a little bit of a funk. He’s an ultra-runner, and he really wants to push the limits. He’s doing his business, and running his day job there. He’s also got a son and a wife. He’s in a rut, like we all seem to get, as we go through phases in our career. He just wants to jump-start himself, to push himself really hard. He meets the SEAL at an ultra-running event, and he’s talked to a couple times, and he convinces him to come live with him for 30 days. The caveat is, he must do everything the SEAL says when the seal says as far as getting everything going. You still able to work, still able to do everything, but the SEAL follows him around for 30 days, and they come up with some crazy workouts.

Going throughout the book, you hear crazy workouts that they do, it’s really awesome. Actually, Trying to take some of those in my own day-to-day activities, and really trying to push the limit. It was really cool just to see how Jesse, who was probably already more fit than I was at that time, and even now, and how he felt like he was in a rut. It gives you that perspective of we all feel at times we’re not doing as much as we could, whether it be from gym, whether it be from learning, anything like that. You’ve got to try some crazy things you really get you out of your rut. Also, One of the things, followed SEAL after the fact, but one of the things you really follow through this is you have to do something that sucks every day. [Laughs] If you do something that sucks every day, by volunteering, you’re volunteering did you some kind of crazy workout. Maybe you going to go do 20 miles, 10 miles, whatever your craziness is for that day. If you volunteered to do those things, it makes it a lot easier do the things that you have to do in life, whether it be around the house, for family, or in your career.

That’s the really cool portion, that I took away from the book. Let’s go through a couple of things I really like. They go through the background for both Jesse and the SEALs. Jesse’s background was pretty cool, where he talks about some of the things that he kind of hacked his way. Think of hiring a SEAL to come live with you for 30 days. That’s like a life hack, right? It’s different ways of experimenting, getting things done. He brought that into his business life as well, so he talks about how he took chances on getting contacts, and meeting people throughout his career, and then also SEAL goes into some of his background, and all the things what he’s gone through, and it’s pretty cool. He’s living with Navy SEAL. They’re following around, and doing workout, so it’s really cool. It’s the workout and the mindset thing that really challenges Jesse and shifts his focus. When he goes into it, they start off doing, SEAL wants to see how many pull-ups you can do, how many push-ups you can do, and he kind of gauges a test. They also do some running things, because running was a big focus of it. When he goes in, Jessie can’t do 100 pull-ups, but they test out, they do however many pull-ups you can. Then, seal is like, “Hey, do 100. We’re not leaving this gym until you do 100.”

It really refocuses the mind, where it’s like, hey, you have something that you have to do? You’re going to do it, right? No matter how long it takes. Jesse was kind of able to break through, and do that. Back to the reader, if you’re reading it, it gives me that mindset, too. To really push, and so, I’ve done a lot of really cool things since reading the book, where maybe even setting a timer and trying to just some workouts in a hotel, or while I’m on the road, or even here in the Big Data Big Questions office, too. It really gives you an opportunity you really push, and do those thing, did you feel good afterwards, too. Some of the coolest workouts and I wanted to pick out. The burpee test. The goal was to get 10 to 12 minutes. You do a hundred burpees. Jesse did that between meetings. SEAL made him do it between meetings. I think he was his full get-up. I don’t know if you wear a suit to work, but he had some kind of button down it seems like, when he was talking about it.

The four miles every four hours 48 hours. They scaled up to this. I think they started off with two miles every four hours for 24 hours, at the end what they did was are we doing for miles every four hours 48 hours. I think that essentially turns into a marathon or two marathons. I’m really bad at math. Anyways, That was a really cool workout. Also, it was working their way up to some of the push-ups, too. It was pretty cool. Jesse, I think, at the end, he got 200 push-ups in a day. Just being able to test you do those things. A lot of this stuff you can hear on Joe Rogan’s podcast, too. Jesse actually appeared on there, I told some behind-the-scenes stories, around some of the things that he and SEAL did. Towards the end of the book, you still don’t get you know who SEAL is, but if you watch Joe Rogan, or if you subscribe to this channel here, we’re going to review a book, and I’ll tell you who was the seal, if you haven’t already Googled it and found out as well.

Would I recommend this book? Hell yeah! It was pretty awesome. You get to go through, and see what normal life for all of us are, as far as work, in family, and doing things like that, and then see what happens when you insert a SEAL it’s going to really kick you. Kick you in the rear, and get you rolling through doing things that suck, and see what it does to you. Maybe that’s why Zico water was so big. I don’t know, probably not. I think Jesse would have been successful either way, but it was really cool to see it all go down around that same time. Second off, it’s really going to you to do things outside your comfort zone. We talked about it a lot here. One of the things that I’d really been pushing and working on the last couple years of speaking. I’ll tell you, it’s a challenge to get on stage in front of 100-200-300, whatever your limits are, and keep pushing those limits. But I really can say, had I not probably been gaining confidence by doing things that get me uncomfortable in the gym, or running, and doing those other things, I think that really translates into what I’m trying to do from a, hey, you don’t want to get up, and do 20 minutes of learning? Too bad. Just do it. Give you that mindset, where you’re like, I’m already doing these other things from a health perspective in my life, so what can I do you feed my mind? Or, what can I do to challenge myself within my career? It doesn’t have to be speaking, just for me that is. Definitely check out this book. Then, the book that I read right after this one, I’m going to follow it up here, called Can’t Hurt Me. Find out more about that one next, but I definitely recommend this book and recommend pushing yourself outside your limits. Until next time, see you again on Big Data Book Club. We still don’t have a name.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Book Review Tagged With: Book Review

My Journey Why I Chose MBA Over Masters in Science

October 5, 2019 by Thomas Henson Leave a Comment

MBA Over Masters in Science

My Master Degree Journey

Once again here at Big Data Big Questions we tackle a College related question. Today is a little different where I discuss MY JOURNEY in choosing a MBA over a Masters of Science. After less than 6 months into my first Software Engineering role, I decide to pursue a Masters degree. One of the biggest reason I acted so fast was advice from peers. The advice was simple knock out the Masters before you get too busy with life in general.

Wow was that good advice for me!

Once the decision to go back to school was made, I had to select as Masters program. In reality I’m sure I made the decision a lot harder than it should have been. Looking back after all these years I’m confident I made the right decision. Watch this episode of Big Data Big Questions to find out my process for choosing a MBA over a Masters in Science.

 

Transcript – MBA Over Masters in Science

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question is another in the “how do I choose a degree” series I guess, that I’ve been getting in, and this one is more just around my personal journey. How did I decide to go with an MBA versus a Master of Science? It was a pretty big decision for me, so I thought I’d share my journey, because I know a lot of folks are looking at, even from an undergrad perspective, am I going to go more information systems, or am I going to go more computer science, or engineering from that perspective? How does it all go through, and what’s the thought process? I’m just going to provide my thought process to how I went through it, and maybe that can help you. Maybe you can give me some advice, tell me if you think I did the right decision.
I want to talk a little bit about my journey into choosing my MBA over choosing a Master of Science, just to give some thought process around that. I was not a traditional student in the fact that I graduated a little bit older. I had a career outside of tech a little bit before I really focused down, and buckled down, and went back and got my information systems, or CIS Computer Information Systems. It’s different at other places. Whenever I went through that, I made it, really, a focus that as soon as I graduated, I was really going to try to get back in. I think I only took six months off. During that six months, and even before then, I knew I wanted to go and get my Master’s, and so I was really struggling with the fact that, hey, what do I need to do? Should I go for a computer science or some kind of science master degree, since I had what was essentially a business degree in information systems, or should I continue on the path and go down the road of getting my MBA?

Sought out a lot of information from mentors, people I worked with. I was very fortunate in my first job, where they would pay for my college tuition for my Master’s degree. I was really excited about that, and also another one of the reasons that it probably really drove me within six months of graduating, getting a job, going back in and being like, “I want to work full-time and get my Master’s degree.” For me, sought out some mentors. My manager at the time, he had an MBA, and that was one of the things I asked him. I was like, “How did you decide?” He was like, didn’t really matter as much in his eyes from what he’s seen, and with him having an MBA, plus, a little bit biased. For him, he was like, “I liked it.” The thought process around having an MBA, being able to say MBA in your title, is a little bit different and has more of a pop, I guess, from his perspective. Another one of the mentors I talked to, actually he had a Master’s of Science, and he was upper-level director at the organization I worked for, and just talking with him, and he was along the same path of not necessarily saying that an MBA was going to matter. He was more that it matters that you finish. From that perspective, so really after I had some advice from that perspective, it really let me go in, and the way that I actually chose is like, all right, it’s not going to hurt my career path either way.

I really went through, and I looked at the programs. I compared the programs, compared how long it would take. It would take me a little bit longer to go through the Master’s of Science and really, it was more about some of the classes and some of the cool things. I’ve always had a knack for business. I really liked accounting when I had accounting classes previously in my undergrad. It really gave me an opportunity to dial in and look at some of the things across business from an economics perspective. Some of the computer information systems classes, because they’re still focused. You have an MBA. You have a focus. I still focused on that. Being able to do some things with more Java classes, because at the time, I liked Java. [Laughs] Some of the things with databases and information systems. Really, there were some cool things that were going on around healthcare that piqued my interest as well, too. Chose that path, so I know people have sought out advice, and looked around and asked. Should we do this? Should I go for a data engineering degree or a data science degree? It really depends on what you want to do, and I don’t think, just like with my journey, picking one or the other is going to hurt you down the road. Going back to the blunt advice I got from a senior director was, it matters more about if you finished it.

When you start out on that journey, make sure that you capture it and go on. That doesn’t say that anybody that’s watching this don’t think it’s a thing where you’re like, hey, you have to get a Master’s degree to be able to succeed within your role. We’ve proven that, especially in tech, so much in tech. There’s folks without Bachelor’s degrees, without Master’s degrees, and even without high school diplomas. It’s more about how creative you can be and how much you can focus, and just really pull yourself into your craft, whether it be development, whether it be analytics, or wherever you want to go. Just for me, for that journey, it was having me being a later student is kind of like, for me, I really wanted to go back and prove to myself that I could finish and stick it out. Being one of the first in my generation, between my family, to go and to have that Master’s degree, also was really awesome. Personal decision, but I’m sharing it with everybody here. Everybody’s situation’s different. Happy to give advice, happy to talk through it all, but that’s my story, my journey. If you have any questions, put them in the comments section here below or reach out to me on bigdatabigquestions.com. I’ll do my best to answer those, and I’ll see you again next time on Big Data Big Questions.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Career Tagged With: College, Data Engineer Degree, Masters of Science, MBA

Ultimate Battle Tensorflow vs. Hadoop

October 4, 2019 by Thomas Henson 1 Comment

Tensorflow vs. Hadoop

The Battle for #BigData 

This post has been a long time coming!

Today I talk about the difference between Tensorflow and Hadoop. While Hadoop was built for processing data in a distrubuted fashion their are some comparison with Tensorflow. One of which is both originated out of the Google development stack. Another one is that both were created to bring insight to data although they both have different approaches to that mission.

Who now is the king of #Bigdata? To be fair the comparison is not like for like but most of the time are bound together as it has to be one or the other. Find my thoughts on Tensorflow vs. Hadoop in the latest episode of Big Data Big Questions.

Transcript – Ultimate Battle Tensorflow vs. Hadoop

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question is really a conversation that I heard from, actually, my little brother when he was talking about something that he heard at a conference. He brought it to my attention. “Hey, Thomas, you’re involved in big data. I was talking to some folks at a GIS conference around Hadoop and TensorFlow.” He’s like, “One person came up to me and said, ‘Ah! Hadoop’s dead. It’s all TensorFlow now.” I really wanted to take today to really talk about the differences between Hadoop and TensorFlow, and just do a level set for all data engineers out there, all big data developers, or people that are just interested in finding out. “Okay, what’s happening in the marketplace?” Today’s question is going to come in around TensorFlow versus Hadoop and find out all the things that we need to know from a data engineering perspective. Even in the end, we’ll talk about which one’s going to be around in five years. Find out more right after this.

Welcome back. Today, as promised, what we’re going to do is, we’re going to tackle the question around which is better, what’s the differences of TensorFlow versus Hadoop, where does it fit in data analytics, the marketplace, and solving the world’s problems? If you’re watching this channel, and you’re invested in the data analytics community, you know how we feel about it, and we’re passionate about, we’re being able to solve problems using data. First thing we’re going to do is break them down, and then at the end, we’re going to talk about some of the differences, where we see the market going, and which one is going to make it in five years. Or, will both? Who knows. First, what is TensorFlow. We’ve talked about it a good bit on this channel, but TensorFlow is a framework to do deep learning. Deep learning gives you the ability to subset, and a branch of machine learning, but it’s just about processing data. The really cool thing about TensorFlow, and the reason TensorFlow and frameworks similar to TensorFlow in the deep learning realm are so awesome is because it gives you the portability to run and analyze your data on your local machine or even spread it out in a distributed environment. It comes with a lot of different algorithms and neural networks that you can use and incorporate into solving problems. One of the cool things about deep learning is just the ability to actually look and analyze more video data or voice recognition, right? Or, if you’re going on Instagram or you’re going on YouTube, and you’re looking for examples on deep learning, chances are somebody’s going to build some kind of video or some kind of photo identification that will help you identify a cat. That’s the classic example that you’ll see, is, “Hey, can we detect a cat by feeding in data, and looking, and analyzing this?” Tensorflow doesn’t use Hadoop, but TensorFlow uses big data. You use these large data sets to train your models that can be used on edge devices. If you’re even used a drone, or if you’ve ever used a remote control to use natural language processing to change the channel, then you’ve used some portion of deep learning or natural language processing. Not saying it’s TensorFlow, but that’s what TensorFlow, it really does. It’s very popular, developed by Google, open sourced, and housed by Google. A lot of free resources out there, and for data scientists and machine learning engineers, it’s a very, very exciting product to be able to build out and be able to start analyzing your data quicker and in a very popular fashion. Couple together the excitement for deep learning, couple together the ease of use of TensorFlow, and that’s why the market has just been hot for TensorFlow and those other frameworks.

What is Hadoop? Hadoop, it’s all about elephants, right? Hadoop has really been around since, I don’t know, we’re probably in 12 to 13 years of it being open source, but if we think back to what we did from analyzing data that was coming in from the web, think about being able to index the entire web, it’s kind of what Google helped develop that, and Yahoo, and a lot of the other teams from Cloudera and HortonWorks, really helped to push Hadoop into the open source arena. Hadoop is synonymous with saying big data. You can’t say big data without thinking about Hadoop. Hadoop’s been around for a long time. There’s a lot of different components to Hadoop, and even on this channel, whenever we talk about Hadoop, we’re specifically really talking about the ecosystem. The ability to process data, but the ability to also store large amounts of data with HDFS, so the Hadoop distributed file system, there’s a lot of components in there. There are APIs, and there are other tools that help for you to do it, but one of the things that I really like to think about when we talk about Hadoop and why it was so record-breaking, and just really open the market for big data was just the ability to set up distributed systems and be able to analyze large amounts of data. These large amounts of data would be more in the unstructured data, so think of it not being in a database, but a lot of it would still be in text-based. You could go out there, very popular example is going out here, setting up an API to pull in Twitter data, and be able to do cinnamon analysis [Phonetic 00:05:13] over that. Not so much the deep learning. They’re trying to get into the deep learning area right now, but more of machine learning, using algorithms like singular value decomposition or [Inaudible 00:05:25] neighbor, but being able to do that over large sets of data. Large sets of data with multiple machines. Hadoop, been around for a while, more seen as replacing the enterprise data warehouse. With TensorFlow now on the scene, where does Hadoop fit in, and what’s going on, and what are some of the differences?

Hadoop was written in Java. TensorFlow was written in C++. Both of them have APIs. They give you the ability to, whenever we’re talking about the processing of data, you can do it in Java, you can do it in Python, you can do it in Scala. There’s a lot of different options there from a Hadoop perspective. TensorFlow, too. You can see C++. You can also see it in Python. Python’s one of the more popular ones, actually did a course using TF Learn and TensorFlow to show that. When we think about the tools, it’s a little bit different. When we think about Hadoop, we’re actually building out a distributed system. Then, we’re using things like maybe Spark. Think of using Spark to be able to analyze that data. We’re going to pull insight from that data back to our cinnamon analysis that’s going to say, “Hey, these specific words in here, when we see them, this tweet is unhappy,” or, “This tweet is happy.” Versus TensorFlow, same thing. More of a processing engine, like framework to be able to pull in, analyze the data, and give you insights on whether that image contained a cat or not a cat. You’re starting to see some of the differences. We talked about Python versus Java. Both of them, there’s different APIs that you can start to use those. I’m probably talking right now about saying that I haven’t seen a lot of Java and TensorFlow, but I’m sure somebody has an API or some kind of framework out there that works on it. Another big difference, too, is the way that the processing is done. The Hadoop ecosystem’s really trying to get into it right now, but from a TensorFlow perspective, we’re really seeing it on GPUs, right? Think of being able to use GPUs to process data, 10-20, a lot faster than what we see on a CPU. Where Hadoop is more CPU-based, the way that we’re solving problems with Hadoop is we’re throwing a lot of CPUs in a distributed model to process the data and then pull it back in. TensorFlow, same thing, distributed networks. As you start to scale out your data, you really need to distribute those systems, but we’re doing it with GPUs. That’s speeding up the process. Little bit of a difference there, just in the approach, but that’s one of the big key differences. If we’re a data engineer, and we’re evaluating these, where do they come in? Ease of use, Hadoop, you’re building out your distributed system. Really Java-based, so if you have a Java background, it’s really good, but you can get by without it in some areas. It’s really not so much of a comparison with ease of use, but if we’re talking about just being able to stand something up and start messing around with it, it’s going to be a little bit more complicated and harder to do it from a Hadoop perspective with TensorFlow. You can actually look at an NFS file system. You can feed in data from different file systems, where with Hadoop, you’re building that system out, and also building out a file system. You’re building out distributed systems, and you’re building out disaster recovery and some of the other components. It’s harder to do from a Hadoop perspective, but there’s more expertise in it, because you’re actually building out a whole solution set, versus TensorFlow is the processing system that you’re using. The comparison on that perspective is somebody tries to talk to you about that, kind of explain that it’s, these are two different systems, right? When we’re talking about which are we using, that really comes down to it. If you’re looking for a project, and somebody says, “Hey! Should we use TensorFlow here, or Hadoop?” It’s going to be pretty easy to spot those, I think, because when you’re starting to look at them, if you think of Hadoop, think of something that’s replacing or falling in line to the enterprise data warehouse. What are we doing? Do we have massive amounts of data. It could be structured, semi-structured, but you’re wanting to offload, and you’re wanting to run huge analytics over that processing. Then, that’s probably going to be a Hadoop perspective. We’re probably building out that system when we think of the traditional enterprise data warehouse. That’s the bucket that we’re going to fall in. If we’re talking about doing some sort of artificial intelligence or doing some things with deep learning, maybe not so much in the machine learning era, you’re going to want to look at TensorFlow. Especially, listen for keywords like, hey, what are we doing from the perspective of images, or video, or voice? Any of those media-rich types of data, then you’re probably going to use TensorFlow, too. If you have machine learning engineers, a data scientist, and you’re trying to do rich media, TensorFlow’s going to be your really popular one. If you have more data analysts, and even your data scientist, but from the perspective of, we’re looking at large amounts of data and wanting to marry it, but we have it in some kind of structure and some kind of standardized system, then Hadoop may be your bucket.

Which one of these is going to be around in five years? I think they’ll both be around, but I will say that the popularity for Hadoop will continue in some degrees, but it’s more continuing to replace that enterprise data warehouse. Think of what you do from a traditional perspective in holding all your company’s information, from that perspective, where we’re seeing more product development, more media-rich things that are being done from an artificial intelligence. We’ll see more TensorFlow there. Will TensorFlow still be the number one deep learning framework in five years? Will deep learning, I can’t answer that here. Would I learn it if I were just starting out as a data engineer? Yeah, definitely. Definitely from the perspective of, I want to learn how to implement it and how to use it. You don’t have to become an expert. We’re not trying to become a data scientist from that perspective, but start looking at some of the frameworks, and building out, going through some of the simple examples that they have, and then heavy use on docker, container, and that whole world of being able to build those out. That’ll help you if you’re really trying to look into, hey, what could be next for data engineers? Or, what’s going on now? What’s cutting edge from that perspective? I hope you enjoyed this video, please, if you have any comments on it, if I missed something, put it in the comments section here below. I’m always happy to carry on the discussion. Until next time, see you again on Big Data Big Questions.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Tensorflow Tagged With: Data Engineering, Hadoop, Tensorflow

What I’m Learning Report #1 (Docker Deep Dive, K8s, & More)

October 3, 2019 by Thomas Henson Leave a Comment

One question I get a lot of on Big Data Big Questions is “Thomas what are you learning”. Honestly not as much as a Is should. It’s true I believe the key to being successful in any part of life is to continually learn.

Looking to change careers from Web Developer to DevOps?

Do you want to be a better partner or spouse?

Trouble with public speaking?

All the answer to the above questions start with learn and end with consistency. If you make it a habit to learn and are consistent with it there isn’t anything you can’t accomplish. Alright enough with selling you on learning! I wanted to share with you what I’m learning and to help motivate myself TO KEEP LEARNING. The way I plan to accomplish this is with monthly learning reports. 

30 Minutes of Learning Everydayish…

For along time I’ve advocated for the idea of taking 30 minutes everyday to learn something new. I’ll go through time periods where I’m hitting that everyday then sometimes where I fall behind. While it’s only a target and nothing to beat myself up about, I find it a useful technique when learning any concept. The way I do my 30 minutes of learning is to set a timer on my phone for 30 minutes and focus only on that topic for 30 minutes. Recently, in order to track this habit I’ve been using the Super Habit App. Below you can see how I’ve done over the last month. Honestly not my best work but let’s see how it improves overtime.

What I'm Learning 30 Minutes

Pluralsight

Mainly my 30 Minutes of Learning comes from Pluralsight courses. Not only am I an Author but I’m also a student. Pluralsight has been a part of my personal learning path long before I was an Author. Back when I was a fresh new Web developer I used Pluralsight to learn C# and ASP.NET frameworks. Of course, I also dove into the world of JavaScript, JQuery, and other JS frameworks. Now days I still learn with Pluralsight but the content is more Data Engineering and IT OPs focused.

Docker Big Picture Course

Docker seems to be taking over the world. In fact, contributions and adoption of Docker and Kubernetes has outpaced Hadoop exponentially. So many new applications and services offer a containerized version. In the Hadoop 3.0 release multiple features were to add support for containers. All this container talk has pushed me to learn this amazing Platform-as-a-Services for OS virtualization. My guide on this journey is the great Nigel Poulton. Checkout my notes from the Docker Big Picture Course:

  • Kubernetes originated out of Google (shocking I know)
  • Kubernetes is Greek for helmsman or Captain
  • Web Playground for K8s
  • Web Playground for Docker
  • Docker Engine – daemon –> containerd –> OCI
  • Docker has both community and enterprise versions
  • Kubernetes = K8s

Docker Deep Dive Course

After working my way through the Docker Big Picture Course I decided to stay in the Docker world by watching the Docker Deep Dive Course. I loved this course where I was able to get hands on with Docker and learn a good bit beyond the basics. Here are a few notes I jotted down throughout this course:

  • Docker Commands like
    • List – docker ps
    • Pull Image – docker image pull
    • Build Image – docker image build
    • Run Image – docker exec
  • Building Docker images is as easy as writing a YAML file
  • YAML = Yet Another Markup Language
  • Docker networking with bridge drive on Linux or NAT driver on Windows
  • Stacks and Services – code –> container –> services
  • Docker Universal Control Plane (UCP) is installed on top of Docker Engine
  • Docker Trusted Registry can be setup for storing all Docker Images.

What I'm Learning

Data Related Podcast & Blogs

Data Engineer learning doesn’t only take place in courses. I also wanted to track some of the Podcast and articles I consumed throughout the month. The great thing about Podcast is you can listen to them while commuting, working out, house chores, or just about anywhere. Here are a few Podcast and Articles I’ve consumed over the past month.

  • Conversational AI Best Practices with Cathy Pearl and Jessica Dene Earley-Cha – GCP podcast that digs into the aspects of conversational AI. I loved this podcast to explore where conversational AI is going and where to get started with NLP in GCP. Actually gave me some ideas for my October learning goals.
  • Microservices.io – Uh I can’t even begin to summarize how much content is on this site. If you are looking to learn more around Microservices (which you should!!) then bookmark this site and read this content over time.
  • Doctor AI – Dell Tech (full disclosure: #iworkforDell)  podcast diving into different topics around AI. In this episode host Jessica explores the possibilities of AI augmentation in the medical field. One the areas I’ve spent a good bit of research in and spoke about. Earlier this year I spoke with a group of Medical Doctors and Researchers at NYU around advances in AI.
  • Exploring AWS Lake Formation – AWS podcast with guest from around the AWS world. A lot of great content on this on this podcast. Listened to this particularly episode while walking my son so my attention wasn’t what it should have been. Mostly I remember that Data Lake Formation is an AWS services that helps with cataloging and label data to support multiple services (MySQL, Redshift, S3).

On To Next Month

Thanks for supporting this new series and I’m excited to see how it matures over time. Also would love if I got more consistent with my learning as well.  If you have ideas for things I should be learning or would like to share what you are learning put it in the comments below. Right now my thoughts are to wrap up the Kubernetes Deep Dive course then move on to Natural Language Processing (NLP). I’ve got ideas for some really cool projects in NLP so it should be fun.

Filed Under: Article Tagged With: AI, AWS, Docker, GCP, K8s, Kubernetes, Learning

Data Engineers: Data Science vs. Computer Science Degree

October 2, 2019 by Thomas Henson Leave a Comment

Data Science vs. Computer Science Degree

How Do you Choose the Right Degree?

College is such tough time when it comes to choosing education paths. For most folks College makes the first time they are making huge decision about their futures. So it’s easy to get analysis paralysis because the decision means so much. Or does it? At the end of the day it feels bigger than the decision really is over the long term.

The difference between a Data Science Degree and Computer Science degree might impact career outlook in the short term. The long term impacts of which degree you chose are minimal. Look around at the number of position where degrees aren’t even a requirement. When I was working on my first Big Data project our Data Scientist didn’t have a degree in Data Science but he was great in that role. Now I will say that Data Science degrees haven’t been around that long so it kinda of make sense.

Find out my thoughts of the differences between a Data Science Degree and Computer Science Degree in the video below.

Video Data Science vs. Computer Science Degree

Transcript – Data Science vs. Computer Science Degree

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question comes in from a user, and it’s all about, what specific Master’s degree should I get? Find out how I answer this question and what Master’s degree you should get or should not get if you’re going into data engineering.

Today’s question. If you have a question, make sure you put it in the comments section here below. Reach out to me on thomashenson.com/bigquestions. Find me on Twitter, whatever you want to do, and I’ll do my best to answer your questions right here.

Today’s question. I’m looking for a career as a data engineer, but I’ve got a Bachelor’s in IT, and I’m looking to get into a Master’s degree. Awesome! Congratulations. It’s a pretty cool thing to go through. I went through a Master’s program as well. Which is better for data engineering career? Thinking about that specifically. A Master’s degree in data science or a Master’s degree in computer science.

This question, for me, really keys. I remember what it was like going through, when I’m trying to figure out which kind of Master’s I wanted to go to. I had a similar situation. Specifically wasn’t in the data engineering from that perspective, but I was looking in, to see, what do I want to do to take the next level in my engineering career? I looked at an MBA with an emphasis in information systems versus a Master in Science Computer Science. I ended up choosing to continue on down the business path and getting my MBA in information systems. Pretty excited to have gone through that, and really happy with my decision. I feel like it’s been fortunate with my career. I understand where you’re coming from. I’m not telling you to get an MBA. That’s not what I’m saying. I understand how much you look, going back and forth, and you’re like, man! What do I do here? I appreciate you asking for my opinion, as well. Which one should you get if you’re going into a data engineering? It’s an easy guess for me, here, just to say, “I think computer science and the skills that are involved in computer science are going to help.” If I were in your shoes, I would look, and pivot more towards the computer science. I would look into, though, there are new universities and other programs that are starting to emerge that actually have a data engineering track. Just like you were asking about, should I do the data science? In my opinion, if you’re not trying to go down the data science path, you maybe don’t go into that. If they do have a tack specific for data engineers, so data science in a newer program, a lot of universities and colleges are having around the globe, so if they have a specific data engineering path, I’d look into that. Specifically, I’d probably stay with the computer science track. However, like I said, there are some universities that are putting out a specific, “This is not data science,” but a specific data engineering path, where you’re going to go through more systems administration stuff, where you’re going to be building out programs that are going to analyze data, and being able to really focus on distributed systems, whether it be from Kubernetes, and containers, to different clouds. No one had to do it in AWS. Building out good data pipelines and really understanding what you’re doing from that perspective. I think I’d look into that, and also make sure you’re looking at some of those degrees.

One more bonus tip around as you’re going through that. I would definitely, at the university that you’re looking at, have a conversation with some advisors, and even some of the professors in the data science world or in the computer engineering world, and see if you can cross over. Maybe there’s an opportunity there to do something inter-disciplinarian. Maybe you can take a couple of the data science courses, because they would be really good for you to get exposure to it, not become a data scientist, but exposure to what goes on, on the data science side, and have those packaged together, and go through some of those courses while you’re going through the computer science course. Maybe they, not asking you to take double load. Hopefully there’s a crossover there, where it’s like, “Hey, I can pick and choose some of these.” With data engineering and just the boom that’s going on with that as far as careers and, if you look at just globally, we need more data engineers. The universities will be pretty excited for, especially somebody standing out to do that. Worst case scenario, what are they going to do? Your professors may tell you no, but they see that you’re engaged, and that you’re interested in data engineering, so they’re going to be able to look out for, maybe there’s new classes that are coming up. What about internships, right? Some of these universities have really good relationships with corporations. Your name is already at the top of the list, and it’s shown that you’re showing initiative, that hey, I’m excited about the data engineering world, so any opportunities to learn more or any opportunities for future career growth, might be a good thing. Something as simple as taking an hour to reach out and talk to a professor may be investing in yourself and in your career for further on down the future. Definitely try that out. Should you get a Master’s degree to become a data engineer? You don’t have to, but like I said, I’ve got a Master’s degree, and I went through that for my own purposes. If you’re watching this video, you’ve made it all the way to the end, which I hope you’ve made it to the end. Everybody that starts watching it, this was a specific question where we were talking about different degree options for your career. We’re not saying that you have to get the Master in Computer Science to become a data engineer. Heck, you can even go through, you can do the Master in Data Science and become a data engineer. This is just my advice for what we’re trying to do. There are other data engineers that don’t have degrees. We’ve covered that quite a few bit on this channel, and so I just want to be specific to that. I don’t want people watching this course, especially if you’re in college, or if you’re in high school and you’re starting to think about your data engineering path, like, “Aw, man! I’ve got to go get a Master’s degree to do this. Be in it for the long haul. That’s not what we’re talking about here. We’re just talking about options. Let me know if you have any questions about degrees, certifications, anything data engineering or technology-specific, and I will answer it on the next episode of Big Data Big Questions.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Career Tagged With: Computer Science, Data Engineer, Data Science

Explaining Splunk Architecture Basics

July 22, 2019 by Thomas Henson Leave a Comment

Explaining Splunk Architecture

Splunk Architecture

In this episode of Big Data Biq Questions we explain the basics of the Splunk Architecture. Splunk is a hot solution in the world of Big Data and many Data Engineers are eager to learn how to use Splunk to analyze machine data. One of the first things you want to understand is the 3 basic architecture structures in Splunk:

  • Forwarder – helps move data or log files from devices, edge, IoT, or anything into other Splunk instances.
  • Indexer – Adds searchable order to data coming into Splunk instances.
  • Search Head – Allows data to be searched in Splunk by Data Engineers, Splunk Users, and Splunk Architects.

Learn more about Splunk Architecture by watching the video below.

Transcript – Explaining Splunk Architecture Basics

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today, we’ve got a good topic coming in. Something we’ve talked about a little bit before. We’re going to talk a little bit about Splunk. Today’s question, just remember, if you want your question answered here on Big Data Big Questions, put it in the comment section below. Find me on YouTube. Wait, we’re already on YouTube. Find me on Twitter. Find me on Instagram. Just put it in the comments section here below. Reach out, and I will do my best to answer those questions. Today’s question comes in, and we’re talking around Splunk.

What are the basics of Splunk architecture? Really, just wanted to key off of that, and talk a little bit. We’re going to break it down by three different pieces, but the first thing we need to know is, we need to know what Splunk is. Splunk, if you’ve been watching this, is one of those tools that’s out there, that allows for you to take machine generated data and be able to analyze it. My joke is, if you can create tables, and pivot tables in Excel, then you can easily start ingesting and starting looking and visualizing your data in Splunk. Think about, it started off as icy operations. Being able to take in, whether it be log files, whether it be system files, whether it be people trying to break into your network. Anything that’s going on from your network trafficking perspective or logins.

All those different log files from all these different machines, being able to put them in one place, be able to index them, and be able to view them. Splunk has been an amazing tool for that. Like I said, Easy Button. They coined the phrase Easy Button For Machine Data. Pretty cool. Anything machine generated, they’ve been into, but they’re also into IT security. Really, if you think about big data, you’re talking Splunk. IoT is one other big key features and focal points, too.

Let’s talk about those three basic architecture features. We’re going to break it down. The first thing you need to know, if you’re looking to be able to talk Splunk and know what the Splunk architecture is made up of, the first thing is forwarders. What forwarders are is, think of this as a way to, you’ve got a machine running on the edge. You’ve got a machine running your data center. You’ve got one running in the cloud. Anywhere you have a machine or have any kind of device that you want to get data back from, there’s something called a Splunk forwarder. The forwarder is that first key. What that’s going to do if, that’s a very, very small file that’s running or very small application that’s running on that device, that machine, whatever it is, and it’s just forwarding whatever the information is. You’re looking to forward log files. You’re forwarding log files. Say that you have a phone. You’re forwarding log files from a game or from an application on your phone. You’re going to use a forwarder to send that data off. First thing is, learn what a forwarder is. We’re going to be able to run a small application and send data to our Splunk environment.

Number two, the next piece, building block for Plunk architecture, is going to be our indexer. What the indexer is, it’s going to take that data. We’re forwarding those files, it’s forwarding that data to the indexer. What the indexer’s going to do is, they’re going to put a timestamp on it, put some other information, but it’s basically the indexer’s going to say, hey, this is how we’re going to look for this file. We’re probably talking about millions and millions of files. Think about is being able to index it if you’re familiar with databases. You definitely understand. If you’re a data engineer in the big data world, on Hadoop, you understand how indexes work and how you can use indexers to be able to search your data a lot quicker. The second portion, just to recap, is our indexer.

Now that we’ve got our data indexed, it’s time to move on to the next phase. In the next phase, we’re talking about number three. That’s going to be our search head. Our search head is how we can visualize and how we can start looking, and querying out data. Think about it. We’ve got our data that’s been forwarded from our phone. We’ve got our application file that’s coming off of a mobile device, being pushed into our indexer. Our indexer says, “Hey, you know, here’s a timestamp for it. Here’s some other information that we’re pulling into it. Now, me, the user, comes in and says, “Hey, I want to index that data,” or, “I want to search that data, and so, I’m using, interacting in with a search head that’s going to go out, and going to find that data, and going to be able to help with our queries. But, also help whenever we’re using our queries to build out dashboards, or some amazing tables that’s going to help us visualize our data. Those are the three basic building blocks when we’re talking about Splunk architecture. You have your forwarder, you have your indexer, and you have your search head, and there’s a lot of different ways that you can configure those, and there’s a lot of different ways that you can architect those. Those are the basic building blocks that you’re going to use if you’re talking about the Splunk architecture. If you’d like to learn more, I’ve got a couple Pluralsight courses out there. One called Analyzing Machine Data With Splunk, and then also another one that’s building on the Splunk learning path for Pluralsight. That’s [Inaudible 00:05:07] configuring Spunk, with other courses coming and showing you how to visualize that data, how to search that data, how to set up alerts. A lot of different information, so if you’re curious about that, there are some resources for it, but there’s a ton out there as well. Splunk has great documentation. There’s other courses and other things out on YouTube that you can find, that will help you learn more about Splunk. If you’re interested in Splunk, and interested in being able to use a tool like Splunk to visualize whether it be machine-generated data or IoT. Especially if you’re trying to get into the more security path. Then, Splunk is a great took for that. A lot of information out there. Hope you found this video very informative. If you have any questions or have any ideas for the show, put them in the comments section here below, but also make sure that you’re subscribed and you ring that bell, so that you never miss an episode of Big Data Big Questions.

Should get a sponsorship about water. Does anybody know who the agent is for water? Eh. Maybe get some kind of sponsorship. Hey man, you know? There’s those milk ads, right? Who knows?

Filed Under: Splunk Tagged With: Splunk

What Is A Generative Adversarial Network?

July 18, 2019 by Thomas Henson Leave a Comment

 

 

What Is A Generative Adversarial Network

Generative Adversarial Networks

What are deep fakes? How are they generated? On today’s episode of Big Data Big Questions we tackle how Generative Adversarial Networks work. Generative Adversarial Networks or GANs work with 2 neural networks one a generator and another a discriminator. Learn about my experience with GANs and how you can build one as well.

Transcript What Is A Generative Adversarial Network?

This is going to be a cool episode, Mr. Editor. We’re going to talk about a painting that was built by AI or designed by AI that went for over $400,000. Crazy.

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today, we’re going to talk about Generative Adversarial Neural Networks. We’re going to talk about a painting, so you’ve all probably heard about a painting that was sold for, like, $400,000. It was built, actually, but a Generative Adversarial Network. We’re going to talk about that, explain what that is, and maybe even look at a little bit of code, and tell you how you can learn more about it.

Before we jump in, I definitely want to say, if you have any questions about data engineering, data science, IT, anything, put them in the comment section here below. Reach out to me at thomashenson.com/big-questions. I’ll try my best to answer them and help you out. It’s all about the community and all about having fun. Today, we’re going to have a lot of fun. I’m excited. This is something that I’ve been researching and looking into since, maybe, at least since the first part of 2019, but for sure it’s been a theme for me for a while.

I want to talk about Generative Adversarial Network, what that is. We think about that from a deep learning perspective. We’ve done some videos. We talk about deep learning, but this is a specific kind, so kind of like [Inaudible 00:01:33] neural networks, this is a little bit different. It still uses the premise of, you have your input layer, you have your hidden layers, and you have your output layer, but it’s a little more complexity to it. It’s been around since 2014. Ian Goodfellow is branded as the creator to that. If you follow Andrew Neen [Phonetic 00:01:52] on Twitter, I just saw where he took a role at Facebook. I think it was a competitive thing, and I think Andrew was saying, “Hey, great pickup for Facebook for picking him up,” but you might want to fact check that.

Like I said, that was breaking news here. Generative Adversarial Network. The way that I like to think about that and describe that is, think of it as having two different neural networks that are working. You have your discriminator and you have your generator. What’s going on is your generator is taking data. Think of, we’ve got, let’s say, a whole bunch of images of people. What’s going on is, our generator is going to take that data set and look at it, and it’s going to try to create fake data that looks like real data. Your discriminator is the one that’s sitting there saying, “Hey, wait a minute. That’s real data. This is fake data.” This is real data, that’s fake data. Just continuing on. You keep going through that iteration, until the generator gets so good, he’s able to pass fake data onto the discriminator. For our example, we’re looking at images of people. What you’re trying to do is, you’re trying to generate data of fake people and pass it through as real people. You’re probably like, “Man. How really good is that?”

Check out this website here. These are fake people. These are not real people. These are really good images, and a little bit creepy. I found this, actually, in the last week, and kind of looked at it. Been sharing it internally with some friends and some colleagues, but man. It’s really interesting when you think about it. These people do not exist. There’s no, these people don’t exist on the planet. These were all built by AI or deep learning. It’s pretty cool. Pretty creepy, too.

You’re probably wondering, “That’s pretty cool.” Been around since 2014. I’m researching it. Should I be researching it? I definitely think it’s something that’s going to be out there. There’s a lot of information around it, and a lot of use cases, kind of don’t know where it’s going to go. I can think of it being used for game development. Being able to create worlds. For somebody that’s creating a game that’s going to have multiple, multiple different levels, or even if GIS, you have to create all these landscapes and everything like that. If you can build AI to automate that, if you use a deep learning algorithm that’s going to automate, and build out those worlds, and make them lifelike, how much busy work is that going to save you? Same thing with GIS and in architecture, but also go back to the website we were just looking at, with the fake people. Oh, my gosh! You can use that in media and entertainment. Think about movies. Maybe we don’t even need actors anymore. That’s a little bit scary. For the actors, I don’t know. You still need Thomas Henson and thomashenson.com on YouTube, right?

Really cool. Something I just wanted to share with everybody, and back to what we were talking about in the first part of the show. The first art that was really sold for big ticket item around AI, over $400,000, and it was a generated image, too. I talk a little bit about it in my implementing TF Learn course, but here’s a code sample, really just showing what’s going on. If you’re looking at it, and all this is done in TensorFlow, here, using the extraction layer of TF Learn. Look here, how we’re creating that generator, and how you’re creating a discriminator. It’s a good bit of code here, but really, this is an example from TF Learn examples, where you’re actually starting to general data in here. It’s pretty cool. Pretty awesome to be able to play with if you have Tensorflow installed in your environment. You can actually do an import TF learn and start running this code from the examples here, and start tweaking with it. Really cool.

I you want to learn more, definitely love for you to check out and tell me all about. Go through my TF Learn course. Tell me all about it if you like it. You don’t have to, but I just thought sharing Generative Adversarial Networks, I thought that was pretty cool. I think it’s something that everybody should learn. At least know a little bit about it. Now, you know. Hey, important thing. I’ve got my generator. I’ve got my discriminator. My generator is making the data that’s trying to pass this real data to my discriminator.

Boom! You understand a lot. Thanks for tuning in. If you have any questions, put them in the comment section here below, and make sure you subscribe just so you never miss an episode, and get some great education around Big Data Big Questions.

Nobody can! Nobody can generate a fake image of me!

Challenge accepted?

Filed Under: Tensorflow Tagged With: Deep Learning, Neural Networks, Tensorflow

Review Coursera’s Neural Networking & Deep Learning Course

July 17, 2019 by Thomas Henson Leave a Comment

 

Coursera's Neural Networking & Deep Learning Course

Another Machine Learning Course?

Yet another machine learning course has caught my attention here lately. Andrew Ng has a new course available on Coursera focused on Neural Networks and Deep Learning. How did I like the course and should you take the course? Find out my thoughts on Coursera’s Neural Network and Deep Learning course.

Transcript- Review Coursera’s Neural Networking & Deep Learning Course

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s questions comes in around a new course that I am taking, myself. It’s not a course that I’m writing. I’ve talked about some of my Pluralsight courses. This is actually a deep learning course that I’m taking with Coursera. It’s the second course that I’ve taken with Coursera. I did another one from Andrew Ng  called, I think, learning machine learning, and just went through that portion, and swore I’d never take another one, and here I am again. Find out my review on that course and how I’m doing on it here in just a second.

[Sound effects]

Today’s question is around what I’m doing from a course perspective. I’m taking a course called neural networks and deep learning. This is actually part one in a large certification series. If you go out to deeplearning.ai, it’s an Andrew Ng specific course. I did his machine learning course before, and went through it, and did some reviews with it on another channel with a group, the Big Data Beer Team. You can always check that out and find that.

I swore I’d never do another course, and here I am doing another one, because the math portion for me is a little more into the weeds than I like to be and really think, from a data engineering perspective, it probably is. Either way, my thing is to do this review and give you all the insights. You can decide if you want to take that course and find out where you are. I’m through part one. The neural networks and deep learning is part one in that course. It’s an Andrew Neen course, so he’s like, probably trained more people around machine learning and deep learning than anybody else on the planet. Worked at Badu, at Google, Stanford, and has his own company, own startup where he’s walking through driverless cars. Huge authoritative figure who’s teaching this course. It’s amazing from that aspect of it.

Little bit overwhelming, I’ll tell you. We’ll get into it a little bit, but each part of these courses are broken into, I think, four weeks. This first one was four weeks. We’re going to go through how I felt through each of the four weeks, and give you my thoughts on that.

In the first week, week one was intro to deep learning, and really it was about the why for deep learning. Why is deep learning? What’s the history of it? Is this anything new? Is this going to solve all our problems in the future? Eh, maybe.

Maybe we don’t get into that as much, but this was a pretty good one, and I actually did, with each one of these courses, there is a heroes in AI interview session. If you like watching YouTube videos like you do now, this is similar to that, but it’s behind the paywall, or behind the course wall there in Coursera. I actually went through that, when I did not get through all of them, but I did go through this one. It was pretty good. Can’t really remember who it was. Maybe shame on me for that. Should’ve put that in my notes.

Week one was pretty easy to step through and everything like that. There might’ve been a quiz or something, but no programming aspects from that perspective. Week number two, logistic regression in neural networks. Probably my least favorite portion of the course so far. A lot of math-based and somewhat of a review. Actually, when I got to this portion, I was like, “Man, this is…” I was going through the course material and watching the videos. I was like, “This is kind of a review from what I did in the machine learning course.”

I’m going to ace everything here, and I did ace the quiz. It wasn’t too hard, but when we stepped into the programming, it was a little more complicated than I thought, and I have some reasons why I think that is, and I’m going to talk about those here at the end. For the most part, week two was really just a level set. Hey, remember, this is the cross-function. This is how we use linear regression, and just walking through some of those portions, to be able to say hey, this is what’s going on behind the scenes.

If you’ve gone through like I have, and implemented networks, and played around with Tensorflow or TF Learn, you already know some of the things that are going on, which maybe you don’t understand it fully. This was a good review to start off to that perspective. If you haven’t taken the machine learning course, no problem. You can jump right into it. Like I said, he takes it from a high level here and gets you going.

Week three. My favorite week. We talked about shallow neural networks. This is the basics of how to build a neural network. What I like the most about this was, we deep dived into why non-linear functions and why we use different activation functions. It was really cool, because I actually taught a portion of this in my course, and just it was cool to see how Andrew was able to explain it. Maybe not a whole lot better than me. I don’t want to undersell myself, but it was definitely awesome to see his background, and his thought process, and just him saying, “Hey, this is why we use [Inaudible 00:05:19], these are some of the things that you’re going to see with it.” Don’t worry about it, because of these reasons. Really, my favorite portion of this course was week three, so around the shallow neural networks. Still went through and took a little bit longer to do the programming exercise than I thought would take me.

Little bit of stress there, but quizzes were good. It was easy if you follow along, and just take good notes, and you’ll be able to pass the quizzes. There’s a new thing that they’re trying out, too, called notes. I’ve started playing around with that. I’ll probably, in my next video, talk a little bit about that as I’m using it more and more, and maybe that’ll be a quick tip that you guys can use whenever you’re going through a course on Coursera.

Week four, not my favorite week. It was pretty good. We started getting into deep learning and deep neural networks and how those are working. Some of the things that we really did was talked about the matrix dimensions and how some of that works. Didn’t get into it as much as they will in future courses. It’s easy for me to look at it now and say that, because I jumped ahead a little bit. From the perspective of this course, neural networks and deep learning part one, really talks through some of the matrix portions and then starts building out your deep networks. Also, talks about parameters and hyperparameters. I was familiar with hyperparameters and parameters before, just with having been hands-on before, but it was really helpful to do those.

The quiz in this one, once again, if you paid attention, you went through it. You have to work through some math and do some other portions of it, but the quizzes are pretty simple. Make sure you’re using your own notes and everything for that. When it came to the programming exercises, I think there was two in this week, and they were somewhat difficult. I think the second one was pretty long as far as building out. You get to get hands-on with Tensorflow. Still a little bit more challenging, I guess, I think, and there’s some ways that we can make it a little bit better. Let me talk about that here just next.

Overall, I thought the course was all right. It was good for me, just some of it was a little bit of a review. Some of it went a lot deeper than I’ve dove in before, so I thought that portion was good for me. I will say, on all the programming exercises, they’re all graded. One of the things that I find challenging, and maybe it’s just the way that I learn, but I feel like they’re a little harder just because you go through, and it’s like you’re being tested day one. Whenever you’re going through the videos and everything, you’re doing everything from a math perspective on paper, or if you’re taking digital notes, but you’re not really doing any of the programming functions. If you don’t have a solid basics in programming, or it’s not something that you do every day from that perspective, I think it’s going to be a little more challenging. One of the things that could help out, I think, and broaden for the students that are coming in would be to have more coding examples that aren’t graded. It doesn’t have to be verbatim. Hey, this is really, really close to what the examples are. I get that you want to test, and you want to make it so that you’re applying what you’re learning.

Also, I think a few more coding examples where you can go through and see, “These are some of the steps.” If you understand the math portion of it, doesn’t necessarily mean that you’re going to be able to go in and be able to program it right there, and when we talk about it from a real-world perspective, whenever I look at it, yeah, you need to understand those things, and know how to implement those at a base level, but there’s so many. There’s so many other things it can do from a high level. For example, one of the biggest challenges I had going through this was, I build a whole course around TF Learn, and being able to use that abstraction layer over Tensorflow. For me, having to go through step by step, and showing how you can do this, where you can write it in TF Learn or use one of those functions, I think that would’ve been… That would be a different approach to take it, and I think that would broaden the audience, and make it a little more enjoyable, too.

If you’re having to go through, and you know that writing these 60 lines of code is something that you can write in 4, it makes it a little bit harder, especially since I already just did all the math portion, and kind of went through all those activations and everything work, versus having to go through some of the minutia on the programming. That’s just my two cents. If you’ve taken this course, please tell me. Tell me your opinion. You’re listening to mine. Let’s make this a conversation. I’d love to hear what some of your thoughts are, where you think I’m wrong if you think I should be better at math. You’re probably right. I think I’m getting the math. We’ll see.

Fair enough, my programming skills in Python, like I said, they’re all right. They’re not to the level here. I think that’s another gap that I found going through this course. All in all, I guess I would recommend it if you’re looking into using deep learning, but I don’t think that, if you’re a data engineer, that you have to go through anything like this. Like I said, it’s a good aspect of it, but there’s some other things and other skills that you probably want to get. If you’re more looking to the data science, or deep learning, or machine learning engineer, then going through something, one of these, this course would probably be pretty good. In the next video, check out, I jumped way too ahead in the next course. You might see. I jumped to, I think, the fifth portion or fourth portion when I was supposed to go to the second portion. I’ll talk about that in the next video. If you have any questions, make sure you put them in the comments section here below, or reach out to me on thomashenson.com/big-questions. Find me on Twitter or Instagram. Ask any questions. I’ll try my best to answer them. Make sure you subscribe so that you never miss an episode, and ring that bell. Thanks again.

Filed Under: Data Science Tagged With: Data Science, Deep Learning, Neural Networks

Why Data Engineers Should Blog

July 16, 2019 by Thomas Henson Leave a Comment

Data Engineers Should have a blog

Blogging For Data Engineers?

How important is it for Data Engineers to have a blog? In this episode of Big Data Big Questions I talk about importance of building a blog in your career in Data Engineering, Data Analysis, or Data Science. Learn my thoughts on What Every Data Engineers Should Have A Blog in the video below.

Transcript – Why Data Engineers Should Blog

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of…

Big Data Big Questions. Today’s question, I thought I would take a topic that I’ve seen and keeps coming up in some of my videos, and really dig down into it. Maybe this is going to be a multi-part series, but we’re going to talk about starting a blog to build your brand as a data engineer, data scientist, or if you’re watching this and you’re just a technologist or somebody that just wants to do book reviews, trust me, there’s going to be some topics in here that are generalized for everybody, but it really shows you how to key in on your field.

Before we jump into that, though, I want to say, if you have any questions, put them in the comment section here below. This is where I find content to make sure I’m interacting with the community and answering the questions that you want. It also gives me an idea. Hey, there’s enough people that ask a question or interested in a certain topic, and I haven’t done any research on it, gives me an opportunity to study and see what’s going on. This is all about being a community here. Reach out to me on thomashenson.com/big-questions if you don’t want to put it in the comment section here below. I’ll do my best to answer those quick as I can.

Today, I want to talk about why you should start a blog as a data engineer, or data scientist, or if you’re a web developer, and you’re watching this, or anything. I think it’s very important. In 2019, should you start a blog? I think so. I don’t think it’s something that is going away. Just because I say start a blog, you don’t have to start a blog and just write. You can start a vlog. I think you definitely should have your own domain. I bought thomashenson.com. It cost me, I think, $12 a month. No, $12 a year, but it’s like, hosting and everything like that can be really, really cheap. I wouldn’t worry about that. It’s really important. I’m going to talk a little bit first about my journey and why I started a blog.

When I got my first job, like I said, I’ve talked about it before, I was a web developer. One of the things where I was working at, we weren’t really embracing. We were using open source, but we weren’t really contributing, and it was shunned upon or shied upon for us to actually have any code to be able to show or anything like that. One of the things, I didn’t really think about it at the time, but you get a couple years into your role, and you might get opportunities to interview at other places, to do other things, and one of the things that came up that was really whole when I was going through the interview process was, I didn’t have any example code or anything like that I could show. I wasn’t involved in the open source community outside of work, and I didn’t have my code. It was my company’s property, and there were some other pretty big reasons I couldn’t, I didn’t have anything I could point to and show. That got me thinking. I don’t have anything that really captures the work and some of the things that I do. Then, at this time, too, I’d already embraced trying to do at least 30 minutes a day, or maybe even four times a week getting 30 minutes in of learning new things. I had all these ideas and all these things that I was going through and learning in the process, but I could only talk about them. I’m on a whiteboard or from a resume perspective, but I didn’t really, couldn’t really show. Couldn’t let it stand on its own. That’s where I started really looking into blogging. I was like, “Man, maybe I should start a blog.” Start a blog, didn’t really know what I was going to do with it. If you go back and look at some of my early posts, it was like, “You know, I’m doing this, and I’m starting a business!” It really wasn’t a business, it was just me writing. As I started writing, I started talking about some of the things I’ve learned. I would go through and look, and be able to create articles around something I’ve learned, maybe even create some test projects.

A lot of that, they weren’t very good when I started. It can be an opinion thing if they’re good now, but I definitely know that I’ve improved, and I feel like that, but I think it’s something that really helped me and really focused me, too. Like I said, I was a web developer. You’ve all heard my story before, about when I became a data engineer, and jumped into the Hadoop area. I had that platform, and I had already been practicing doing some of the blogging and stuff like that. It was really easy for me, as I was going through, and learning, and learning things that other people wanted to see, to be able to start writing pig Latin tutorials. Hive, and what I’m doing with H base [Phonetic 00:04:40] and HTFS, and just general tips of things that I learned. It was like, strengthening that muscle, and it really helped me just accelerate just in being a part of the community as well, too. That’s my journey. That’s one of the main reasons that I’m so big on it, is because I came from that area, where I didn’t have anything that I could point to and say, “Hey, look.” These are all the cool things that I’m doing.

That’s why I started a blog, but why do I think that you should? What should your story be? Your story, you’re still writing it. You should write it on a blog. I really think it’s something that’s help you build out your brand, and I think it’s always something good that shows, one, you’re interactive in the community. It keeps you honest and keeps you motivated, too. It’s late at night. I didn’t really want to have to record any videos. I wanted to put it off. I have an audience. I have a schedule, and I try to keep content coming out. This made me come out to the office, and make sure that I got on camera, and was able to create content here, too. The same thing with your blog. If you create a blog, say you create a schedule, and you’re like, hey. I mean, I’ve done this before. I’m going to publish once every month. When I was first starting out, you feel horrible when you don’t. I missed quite a few months. It took me a long time before I published every month. I just really wasn’t consistent. It’ll keep you honest about learning. It’ll keep you honest about creating content and being a part of that community, too. I really think that it’s good at any stage in your career, but especially if you’re watching this channel, and you’re trying to figure out, “Where do I get started? What are some things that I should be doing?” You’ve probably heard me say it a ton of times. Start creating something to be a part of the community. I’m not saying go out and… We’ll have a longer session about how to start blogging and how to find, how to create your own content. I’m not saying go out and borrow people’s content or anything like that and put it as your own. There’s a definite way that you can do a lot of different things. I’m going to end this video this time, but maybe this is, we’ll just call this part one. I definitely think we should dig into how to start that blog, some content ideas, but I think today just kick around the idea, just think about it, start churning, start kicking those around in your idea, and then we’ll talk, and follow up later on with some content ideas. I’ll show you how to set up on, I think, I used Dream Host, but there’s a ton of other places out there. It’s something simple that you can set up in 10 minutes, and if you’re using [Inaudible 00:07:18] you can start publishing some of your own content, having your own audience, heck, you can put it in the comment section here below, to build, and we can use our audience to help everybody push their content out there. We can all support each other as well, too.

That’s all I have for today. Like I said, I’m going to follow up. I really like this idea, here. If you have some comments, or you think it’s a bad idea to start a blog in 2019, which I don’t think it is, but I’d love to hear your opinion. All opinions are welcome, so, thanks again, and I will see you next time on Big Data Big Questions.

[Music]

Filed Under: Data Engineers Tagged With: Careers, Data Engineers

Speaking Skills For Data Engineers

July 15, 2019 by Thomas Henson Leave a Comment

How Important Is Public Speaking For Data Engineers?

Brand new question on Big Data Big Questions is around public speaking in Data Engineering. I’ve often heard that public speaking is the universal number 1 fear for most people. So many people choose to avoid it for various reasons. While no where will you see public speaking called out in Data Engineering descriptions, I believe it’s a skill that worth investing in. Find out my thoughts on Speaking Skills for Data Engineers in the video below.

Transcript – Speaking Skills For Data Engineers

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question comes in from a user. If you have a question, find me on Twitter or put it in the comment section here below. Send me an email. There’s a ton of different ways to get in touch and have your question answered on the next episode of Big Data Big Questions.

Today’s comes in from Bobby, and it says, “Can you let me know which career path is better between data scientist or data engineers, which we’ve talked about, but this one is for a person suffering with anxiety or difficulty giving presentations?” So, thank you for your question, Bobby, and I totally understand where you’re coming from as far as having challenges that you’re trying to deal with. Trying to pick out a career path, like, we want to play to things that we’re going to be successful at and things that we’re going to be able to excel in. You’re looking for that career path. I’ll say just right off the bat, a couple things stuck out to me about it. I’m going to get to those as we talk about why I think presentations and stepping outside your comfort level are some options for you. Let’s answer your question first, before we dive into Thomas’s thoughts on some of that.

Depending on which was you want to go, it’s not going to matter. It’s going to be more about if you’re more technical as far as wanting to be code, and hands-on, and building out clusters. Maybe starting to play with Kubernetes, Linux, those types of systems. Then, being on the data engineer side, it’s going to be a good way to go, or if you’re more math-based and want to get into the specifics of, hey, some of these features or some of these pieces of data may be able to give us better insight into what we’re trying to solve, then the data science path is going to be there. Don’t let your anxiety or your difficulty giving presentations say that, “I must go data engineer,” or must go data science, because I think they’re both equal to give you the opportunity to not have to present and not have to have as much interaction as you would maybe in a different role where it’s more customer-facing and job-driven.

My thought process about how much you’re going to have to deal with in that situation is, I’ve worked with people who never had to present. When we were in that role, that just wasn’t their thing. They may be at the meetings. They’ll be at the meetings, but they’re not the point person. Maybe get a one-off question or something like that, but most of that’s in the confines of their team. You’re still going to have team interaction, but there’s still a ton of downtime where it’s like, “Hey, headphones on,” just banging out your own code, or doing your deployments, and stuff like that. There’s not a ton of interaction there. You may have some user interaction that you’re working through, depending on where you are in the stage of your project, but for the most part, I don’t think even outside of the questions here, most of your customer interactions, a lot of times, maybe not so much on the data science side, but it’s going to be nothing like you would think from a web development perspective or front end developer. Still engaging with the users, but more on the team atmosphere. Feel free to choose any of those paths to be able to deal with your anxiety and difficulty giving those presentations. I think you’ll be totally fine, and I think you can get away with never having to give a presentation, if that’s in your vote.

But, I think you should. I think you should try to work towards conquering those difficulties and those presentations, and I’m not saying that you start off going out, and being like, “You know what? I’m going to try to go to a conference and give a keynote.” I’m going to try to go to a conference and give a breakout session. That’s not what I’m saying. I think you should start a little bit smaller, and just on your team, and then if you find a new feature or new software tool, or just a new process that you like doing, present that to your team. I know it’s tough, and I know it’s hard, because they even did a study a while back about the number one fear most people said that they fear public speaking more than they fear death.

Let me say that again. They feared death less than they feared public speaking. Most people would rather die than do public speaking. Definitely it’s something that I’ve been working on for quite a few years, and I’ll be honest, I get nervous each time. I get nervous, start talking to people. I’m like, “Oh, I’m about to go on.” It doesn’t matter. It doesn’t matter to the fact that, maybe I’ve given a certain presentation 25 times.

Heck, every time I turn the camera on, and there’s nobody in this room, here, on Big Data Big Questions, I still get nervous, too. There’s going to be some amount of nervousness, and I understand that, there’s varying levels, too. I’m not looking over and saying that, “Hey, you know, everybody, you know, everybody can be able to do that.” I do think that you can work towards it, and so maybe everybody’s not going to be able to do it on the same is maybe what I mean to say. I think it’s something you should try to, because presenting is going to open up doors for your career. It’s going to make you feel good, too. Each time I talked about how nervous I was, I just spoke in front of over 1,000 people for the first time in my life. That was huge, but I didn’t start out that way all in one day. I’ll tell you, I was super nervous, and it was just for a short amount of time, but I was nervous the whole time leading up to it, and then afterwards, after you get it, it’s like, yes! You get that amazing feeling that you’ve done something. I don’t know if you’re into sports or something like that, but you feel like you’ve won. Even though, who knows, it’s the first time speaking to that many people. I’ll probably hopefully have that opportunity again, and I’ll be better at it next time. It probably wasn’t my best time, if you’re looking at it.

It’s something that you start to work towards. It’ll be interesting, how much networking, and how many doors are open by doing that, and it’s all about giving back to the community as well. To recap, I don’t think that you have to choose data science or choose data engineer to be able to not have to present and do some of the other things. However, I think most people, and if you’re watching this channel, and you’re really curious about career development, I do think that everybody should have some kind of presentation skills, and this is something they should practice towards, and I totally understand. There’s a lot of anxiety whenever you’re doing something like that. If it’s something that you can work towards, and you can conquer, then I think it’s going to be something that’s going to be amazing. One, for the community, because we need more voices. And then two, it’s going to be something that you’re going to be proud of, and you’re going to be able to work on, and it’s just another challenge, too.

That’s all I have today for Big Data Big Questions. Make sure that you hit the subscribe and ring that bell, so you never miss another episode of [Whispers] Big Data Big Questions.

Filed Under: Career Tagged With: career, Data Engineers

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • …
  • 16
  • Next Page »

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2025 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in