Archives for July 2019

Explaining Splunk Architecture Basics

July 22, 2019 by Thomas Henson Leave a Comment

Splunk Architecture

In this episode of Big Data Biq Questions we explain the basics of the Splunk Architecture. Splunk is a hot solution in the world of Big Data and many Data Engineers are eager to learn how to use Splunk to analyze machine data. One of the first things you want to understand is the 3 basic architecture structures in Splunk:

Forwarder – helps move data or log files from devices, edge, IoT, or anything into other Splunk instances.
Indexer – Adds searchable order to data coming into Splunk instances.
Search Head – Allows data to be searched in Splunk by Data Engineers, Splunk Users, and Splunk Architects.

Learn more about Splunk Architecture by watching the video below.

Transcript – Explaining Splunk Architecture Basics

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today, we’ve got a good topic coming in. Something we’ve talked about a little bit before. We’re going to talk a little bit about Splunk. Today’s question, just remember, if you want your question answered here on Big Data Big Questions, put it in the comment section below. Find me on YouTube. Wait, we’re already on YouTube. Find me on Twitter. Find me on Instagram. Just put it in the comments section here below. Reach out, and I will do my best to answer those questions. Today’s question comes in, and we’re talking around Splunk.

What are the basics of Splunk architecture? Really, just wanted to key off of that, and talk a little bit. We’re going to break it down by three different pieces, but the first thing we need to know is, we need to know what Splunk is. Splunk, if you’ve been watching this, is one of those tools that’s out there, that allows for you to take machine generated data and be able to analyze it. My joke is, if you can create tables, and pivot tables in Excel, then you can easily start ingesting and starting looking and visualizing your data in Splunk. Think about, it started off as icy operations. Being able to take in, whether it be log files, whether it be system files, whether it be people trying to break into your network. Anything that’s going on from your network trafficking perspective or logins.

All those different log files from all these different machines, being able to put them in one place, be able to index them, and be able to view them. Splunk has been an amazing tool for that. Like I said, Easy Button. They coined the phrase Easy Button For Machine Data. Pretty cool. Anything machine generated, they’ve been into, but they’re also into IT security. Really, if you think about big data, you’re talking Splunk. IoT is one other big key features and focal points, too.

Let’s talk about those three basic architecture features. We’re going to break it down. The first thing you need to know, if you’re looking to be able to talk Splunk and know what the Splunk architecture is made up of, the first thing is forwarders. What forwarders are is, think of this as a way to, you’ve got a machine running on the edge. You’ve got a machine running your data center. You’ve got one running in the cloud. Anywhere you have a machine or have any kind of device that you want to get data back from, there’s something called a Splunk forwarder. The forwarder is that first key. What that’s going to do if, that’s a very, very small file that’s running or very small application that’s running on that device, that machine, whatever it is, and it’s just forwarding whatever the information is. You’re looking to forward log files. You’re forwarding log files. Say that you have a phone. You’re forwarding log files from a game or from an application on your phone. You’re going to use a forwarder to send that data off. First thing is, learn what a forwarder is. We’re going to be able to run a small application and send data to our Splunk environment.

Number two, the next piece, building block for Plunk architecture, is going to be our indexer. What the indexer is, it’s going to take that data. We’re forwarding those files, it’s forwarding that data to the indexer. What the indexer’s going to do is, they’re going to put a timestamp on it, put some other information, but it’s basically the indexer’s going to say, hey, this is how we’re going to look for this file. We’re probably talking about millions and millions of files. Think about is being able to index it if you’re familiar with databases. You definitely understand. If you’re a data engineer in the big data world, on Hadoop, you understand how indexes work and how you can use indexers to be able to search your data a lot quicker. The second portion, just to recap, is our indexer.

Now that we’ve got our data indexed, it’s time to move on to the next phase. In the next phase, we’re talking about number three. That’s going to be our search head. Our search head is how we can visualize and how we can start looking, and querying out data. Think about it. We’ve got our data that’s been forwarded from our phone. We’ve got our application file that’s coming off of a mobile device, being pushed into our indexer. Our indexer says, “Hey, you know, here’s a timestamp for it. Here’s some other information that we’re pulling into it. Now, me, the user, comes in and says, “Hey, I want to index that data,” or, “I want to search that data, and so, I’m using, interacting in with a search head that’s going to go out, and going to find that data, and going to be able to help with our queries. But, also help whenever we’re using our queries to build out dashboards, or some amazing tables that’s going to help us visualize our data. Those are the three basic building blocks when we’re talking about Splunk architecture. You have your forwarder, you have your indexer, and you have your search head, and there’s a lot of different ways that you can configure those, and there’s a lot of different ways that you can architect those. Those are the basic building blocks that you’re going to use if you’re talking about the Splunk architecture. If you’d like to learn more, I’ve got a couple Pluralsight courses out there. One called Analyzing Machine Data With Splunk, and then also another one that’s building on the Splunk learning path for Pluralsight. That’s [Inaudible 00:05:07] configuring Spunk, with other courses coming and showing you how to visualize that data, how to search that data, how to set up alerts. A lot of different information, so if you’re curious about that, there are some resources for it, but there’s a ton out there as well. Splunk has great documentation. There’s other courses and other things out on YouTube that you can find, that will help you learn more about Splunk. If you’re interested in Splunk, and interested in being able to use a tool like Splunk to visualize whether it be machine-generated data or IoT. Especially if you’re trying to get into the more security path. Then, Splunk is a great took for that. A lot of information out there. Hope you found this video very informative. If you have any questions or have any ideas for the show, put them in the comments section here below, but also make sure that you’re subscribed and you ring that bell, so that you never miss an episode of Big Data Big Questions.

Should get a sponsorship about water. Does anybody know who the agent is for water? Eh. Maybe get some kind of sponsorship. Hey man, you know? There’s those milk ads, right? Who knows?

What Is A Generative Adversarial Network?

July 18, 2019 by Thomas Henson Leave a Comment

Generative Adversarial Networks

What are deep fakes? How are they generated? On today’s episode of Big Data Big Questions we tackle how Generative Adversarial Networks work. Generative Adversarial Networks or GANs work with 2 neural networks one a generator and another a discriminator. Learn about my experience with GANs and how you can build one as well.

Transcript What Is A Generative Adversarial Network?

This is going to be a cool episode, Mr. Editor. We’re going to talk about a painting that was built by AI or designed by AI that went for over $400,000. Crazy.

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today, we’re going to talk about Generative Adversarial Neural Networks. We’re going to talk about a painting, so you’ve all probably heard about a painting that was sold for, like, $400,000. It was built, actually, but a Generative Adversarial Network. We’re going to talk about that, explain what that is, and maybe even look at a little bit of code, and tell you how you can learn more about it.

Before we jump in, I definitely want to say, if you have any questions about data engineering, data science, IT, anything, put them in the comment section here below. Reach out to me at thomashenson.com/big-questions. I’ll try my best to answer them and help you out. It’s all about the community and all about having fun. Today, we’re going to have a lot of fun. I’m excited. This is something that I’ve been researching and looking into since, maybe, at least since the first part of 2019, but for sure it’s been a theme for me for a while.

I want to talk about Generative Adversarial Network, what that is. We think about that from a deep learning perspective. We’ve done some videos. We talk about deep learning, but this is a specific kind, so kind of like [Inaudible 00:01:33] neural networks, this is a little bit different. It still uses the premise of, you have your input layer, you have your hidden layers, and you have your output layer, but it’s a little more complexity to it. It’s been around since 2014. Ian Goodfellow is branded as the creator to that. If you follow Andrew Neen [Phonetic 00:01:52] on Twitter, I just saw where he took a role at Facebook. I think it was a competitive thing, and I think Andrew was saying, “Hey, great pickup for Facebook for picking him up,” but you might want to fact check that.

Like I said, that was breaking news here. Generative Adversarial Network. The way that I like to think about that and describe that is, think of it as having two different neural networks that are working. You have your discriminator and you have your generator. What’s going on is your generator is taking data. Think of, we’ve got, let’s say, a whole bunch of images of people. What’s going on is, our generator is going to take that data set and look at it, and it’s going to try to create fake data that looks like real data. Your discriminator is the one that’s sitting there saying, “Hey, wait a minute. That’s real data. This is fake data.” This is real data, that’s fake data. Just continuing on. You keep going through that iteration, until the generator gets so good, he’s able to pass fake data onto the discriminator. For our example, we’re looking at images of people. What you’re trying to do is, you’re trying to generate data of fake people and pass it through as real people. You’re probably like, “Man. How really good is that?”

Check out this website here. These are fake people. These are not real people. These are really good images, and a little bit creepy. I found this, actually, in the last week, and kind of looked at it. Been sharing it internally with some friends and some colleagues, but man. It’s really interesting when you think about it. These people do not exist. There’s no, these people don’t exist on the planet. These were all built by AI or deep learning. It’s pretty cool. Pretty creepy, too.

You’re probably wondering, “That’s pretty cool.” Been around since 2014. I’m researching it. Should I be researching it? I definitely think it’s something that’s going to be out there. There’s a lot of information around it, and a lot of use cases, kind of don’t know where it’s going to go. I can think of it being used for game development. Being able to create worlds. For somebody that’s creating a game that’s going to have multiple, multiple different levels, or even if GIS, you have to create all these landscapes and everything like that. If you can build AI to automate that, if you use a deep learning algorithm that’s going to automate, and build out those worlds, and make them lifelike, how much busy work is that going to save you? Same thing with GIS and in architecture, but also go back to the website we were just looking at, with the fake people. Oh, my gosh! You can use that in media and entertainment. Think about movies. Maybe we don’t even need actors anymore. That’s a little bit scary. For the actors, I don’t know. You still need Thomas Henson and thomashenson.com on YouTube, right?

Really cool. Something I just wanted to share with everybody, and back to what we were talking about in the first part of the show. The first art that was really sold for big ticket item around AI, over $400,000, and it was a generated image, too. I talk a little bit about it in my implementing TF Learn course, but here’s a code sample, really just showing what’s going on. If you’re looking at it, and all this is done in TensorFlow, here, using the extraction layer of TF Learn. Look here, how we’re creating that generator, and how you’re creating a discriminator. It’s a good bit of code here, but really, this is an example from TF Learn examples, where you’re actually starting to general data in here. It’s pretty cool. Pretty awesome to be able to play with if you have Tensorflow installed in your environment. You can actually do an import TF learn and start running this code from the examples here, and start tweaking with it. Really cool.

I you want to learn more, definitely love for you to check out and tell me all about. Go through my TF Learn course. Tell me all about it if you like it. You don’t have to, but I just thought sharing Generative Adversarial Networks, I thought that was pretty cool. I think it’s something that everybody should learn. At least know a little bit about it. Now, you know. Hey, important thing. I’ve got my generator. I’ve got my discriminator. My generator is making the data that’s trying to pass this real data to my discriminator.

Boom! You understand a lot. Thanks for tuning in. If you have any questions, put them in the comment section here below, and make sure you subscribe just so you never miss an episode, and get some great education around Big Data Big Questions.

Nobody can! Nobody can generate a fake image of me!

Challenge accepted?

Review Coursera’s Neural Networking & Deep Learning Course

July 17, 2019 by Thomas Henson Leave a Comment

Another Machine Learning Course?

Yet another machine learning course has caught my attention here lately. Andrew Ng has a new course available on Coursera focused on Neural Networks and Deep Learning. How did I like the course and should you take the course? Find out my thoughts on Coursera’s Neural Network and Deep Learning course.

Transcript- Review Coursera’s Neural Networking & Deep Learning Course

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s questions comes in around a new course that I am taking, myself. It’s not a course that I’m writing. I’ve talked about some of my Pluralsight courses. This is actually a deep learning course that I’m taking with Coursera. It’s the second course that I’ve taken with Coursera. I did another one from Andrew Ng called, I think, learning machine learning, and just went through that portion, and swore I’d never take another one, and here I am again. Find out my review on that course and how I’m doing on it here in just a second.

[Sound effects]

Today’s question is around what I’m doing from a course perspective. I’m taking a course called neural networks and deep learning. This is actually part one in a large certification series. If you go out to deeplearning.ai, it’s an Andrew Ng specific course. I did his machine learning course before, and went through it, and did some reviews with it on another channel with a group, the Big Data Beer Team. You can always check that out and find that.

I swore I’d never do another course, and here I am doing another one, because the math portion for me is a little more into the weeds than I like to be and really think, from a data engineering perspective, it probably is. Either way, my thing is to do this review and give you all the insights. You can decide if you want to take that course and find out where you are. I’m through part one. The neural networks and deep learning is part one in that course. It’s an Andrew Neen course, so he’s like, probably trained more people around machine learning and deep learning than anybody else on the planet. Worked at Badu, at Google, Stanford, and has his own company, own startup where he’s walking through driverless cars. Huge authoritative figure who’s teaching this course. It’s amazing from that aspect of it.

Little bit overwhelming, I’ll tell you. We’ll get into it a little bit, but each part of these courses are broken into, I think, four weeks. This first one was four weeks. We’re going to go through how I felt through each of the four weeks, and give you my thoughts on that.

In the first week, week one was intro to deep learning, and really it was about the why for deep learning. Why is deep learning? What’s the history of it? Is this anything new? Is this going to solve all our problems in the future? Eh, maybe.

Maybe we don’t get into that as much, but this was a pretty good one, and I actually did, with each one of these courses, there is a heroes in AI interview session. If you like watching YouTube videos like you do now, this is similar to that, but it’s behind the paywall, or behind the course wall there in Coursera. I actually went through that, when I did not get through all of them, but I did go through this one. It was pretty good. Can’t really remember who it was. Maybe shame on me for that. Should’ve put that in my notes.

Week one was pretty easy to step through and everything like that. There might’ve been a quiz or something, but no programming aspects from that perspective. Week number two, logistic regression in neural networks. Probably my least favorite portion of the course so far. A lot of math-based and somewhat of a review. Actually, when I got to this portion, I was like, “Man, this is…” I was going through the course material and watching the videos. I was like, “This is kind of a review from what I did in the machine learning course.”

I’m going to ace everything here, and I did ace the quiz. It wasn’t too hard, but when we stepped into the programming, it was a little more complicated than I thought, and I have some reasons why I think that is, and I’m going to talk about those here at the end. For the most part, week two was really just a level set. Hey, remember, this is the cross-function. This is how we use linear regression, and just walking through some of those portions, to be able to say hey, this is what’s going on behind the scenes.

If you’ve gone through like I have, and implemented networks, and played around with Tensorflow or TF Learn, you already know some of the things that are going on, which maybe you don’t understand it fully. This was a good review to start off to that perspective. If you haven’t taken the machine learning course, no problem. You can jump right into it. Like I said, he takes it from a high level here and gets you going.

Week three. My favorite week. We talked about shallow neural networks. This is the basics of how to build a neural network. What I like the most about this was, we deep dived into why non-linear functions and why we use different activation functions. It was really cool, because I actually taught a portion of this in my course, and just it was cool to see how Andrew was able to explain it. Maybe not a whole lot better than me. I don’t want to undersell myself, but it was definitely awesome to see his background, and his thought process, and just him saying, “Hey, this is why we use [Inaudible 00:05:19], these are some of the things that you’re going to see with it.” Don’t worry about it, because of these reasons. Really, my favorite portion of this course was week three, so around the shallow neural networks. Still went through and took a little bit longer to do the programming exercise than I thought would take me.

Little bit of stress there, but quizzes were good. It was easy if you follow along, and just take good notes, and you’ll be able to pass the quizzes. There’s a new thing that they’re trying out, too, called notes. I’ve started playing around with that. I’ll probably, in my next video, talk a little bit about that as I’m using it more and more, and maybe that’ll be a quick tip that you guys can use whenever you’re going through a course on Coursera.

Week four, not my favorite week. It was pretty good. We started getting into deep learning and deep neural networks and how those are working. Some of the things that we really did was talked about the matrix dimensions and how some of that works. Didn’t get into it as much as they will in future courses. It’s easy for me to look at it now and say that, because I jumped ahead a little bit. From the perspective of this course, neural networks and deep learning part one, really talks through some of the matrix portions and then starts building out your deep networks. Also, talks about parameters and hyperparameters. I was familiar with hyperparameters and parameters before, just with having been hands-on before, but it was really helpful to do those.

The quiz in this one, once again, if you paid attention, you went through it. You have to work through some math and do some other portions of it, but the quizzes are pretty simple. Make sure you’re using your own notes and everything for that. When it came to the programming exercises, I think there was two in this week, and they were somewhat difficult. I think the second one was pretty long as far as building out. You get to get hands-on with Tensorflow. Still a little bit more challenging, I guess, I think, and there’s some ways that we can make it a little bit better. Let me talk about that here just next.

Overall, I thought the course was all right. It was good for me, just some of it was a little bit of a review. Some of it went a lot deeper than I’ve dove in before, so I thought that portion was good for me. I will say, on all the programming exercises, they’re all graded. One of the things that I find challenging, and maybe it’s just the way that I learn, but I feel like they’re a little harder just because you go through, and it’s like you’re being tested day one. Whenever you’re going through the videos and everything, you’re doing everything from a math perspective on paper, or if you’re taking digital notes, but you’re not really doing any of the programming functions. If you don’t have a solid basics in programming, or it’s not something that you do every day from that perspective, I think it’s going to be a little more challenging. One of the things that could help out, I think, and broaden for the students that are coming in would be to have more coding examples that aren’t graded. It doesn’t have to be verbatim. Hey, this is really, really close to what the examples are. I get that you want to test, and you want to make it so that you’re applying what you’re learning.

Also, I think a few more coding examples where you can go through and see, “These are some of the steps.” If you understand the math portion of it, doesn’t necessarily mean that you’re going to be able to go in and be able to program it right there, and when we talk about it from a real-world perspective, whenever I look at it, yeah, you need to understand those things, and know how to implement those at a base level, but there’s so many. There’s so many other things it can do from a high level. For example, one of the biggest challenges I had going through this was, I build a whole course around TF Learn, and being able to use that abstraction layer over Tensorflow. For me, having to go through step by step, and showing how you can do this, where you can write it in TF Learn or use one of those functions, I think that would’ve been… That would be a different approach to take it, and I think that would broaden the audience, and make it a little more enjoyable, too.

If you’re having to go through, and you know that writing these 60 lines of code is something that you can write in 4, it makes it a little bit harder, especially since I already just did all the math portion, and kind of went through all those activations and everything work, versus having to go through some of the minutia on the programming. That’s just my two cents. If you’ve taken this course, please tell me. Tell me your opinion. You’re listening to mine. Let’s make this a conversation. I’d love to hear what some of your thoughts are, where you think I’m wrong if you think I should be better at math. You’re probably right. I think I’m getting the math. We’ll see.

Fair enough, my programming skills in Python, like I said, they’re all right. They’re not to the level here. I think that’s another gap that I found going through this course. All in all, I guess I would recommend it if you’re looking into using deep learning, but I don’t think that, if you’re a data engineer, that you have to go through anything like this. Like I said, it’s a good aspect of it, but there’s some other things and other skills that you probably want to get. If you’re more looking to the data science, or deep learning, or machine learning engineer, then going through something, one of these, this course would probably be pretty good. In the next video, check out, I jumped way too ahead in the next course. You might see. I jumped to, I think, the fifth portion or fourth portion when I was supposed to go to the second portion. I’ll talk about that in the next video. If you have any questions, make sure you put them in the comments section here below, or reach out to me on thomashenson.com/big-questions. Find me on Twitter or Instagram. Ask any questions. I’ll try my best to answer them. Make sure you subscribe so that you never miss an episode, and ring that bell. Thanks again.

Why Data Engineers Should Blog

July 16, 2019 by Thomas Henson Leave a Comment

Blogging For Data Engineers?

How important is it for Data Engineers to have a blog? In this episode of Big Data Big Questions I talk about importance of building a blog in your career in Data Engineering, Data Analysis, or Data Science. Learn my thoughts on What Every Data Engineers Should Have A Blog in the video below.

Transcript – Why Data Engineers Should Blog

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of…

Big Data Big Questions. Today’s question, I thought I would take a topic that I’ve seen and keeps coming up in some of my videos, and really dig down into it. Maybe this is going to be a multi-part series, but we’re going to talk about starting a blog to build your brand as a data engineer, data scientist, or if you’re watching this and you’re just a technologist or somebody that just wants to do book reviews, trust me, there’s going to be some topics in here that are generalized for everybody, but it really shows you how to key in on your field.

Before we jump into that, though, I want to say, if you have any questions, put them in the comment section here below. This is where I find content to make sure I’m interacting with the community and answering the questions that you want. It also gives me an idea. Hey, there’s enough people that ask a question or interested in a certain topic, and I haven’t done any research on it, gives me an opportunity to study and see what’s going on. This is all about being a community here. Reach out to me on thomashenson.com/big-questions if you don’t want to put it in the comment section here below. I’ll do my best to answer those quick as I can.

Today, I want to talk about why you should start a blog as a data engineer, or data scientist, or if you’re a web developer, and you’re watching this, or anything. I think it’s very important. In 2019, should you start a blog? I think so. I don’t think it’s something that is going away. Just because I say start a blog, you don’t have to start a blog and just write. You can start a vlog. I think you definitely should have your own domain. I bought thomashenson.com. It cost me, I think, $12 a month. No, $12 a year, but it’s like, hosting and everything like that can be really, really cheap. I wouldn’t worry about that. It’s really important. I’m going to talk a little bit first about my journey and why I started a blog.

When I got my first job, like I said, I’ve talked about it before, I was a web developer. One of the things where I was working at, we weren’t really embracing. We were using open source, but we weren’t really contributing, and it was shunned upon or shied upon for us to actually have any code to be able to show or anything like that. One of the things, I didn’t really think about it at the time, but you get a couple years into your role, and you might get opportunities to interview at other places, to do other things, and one of the things that came up that was really whole when I was going through the interview process was, I didn’t have any example code or anything like that I could show. I wasn’t involved in the open source community outside of work, and I didn’t have my code. It was my company’s property, and there were some other pretty big reasons I couldn’t, I didn’t have anything I could point to and show. That got me thinking. I don’t have anything that really captures the work and some of the things that I do. Then, at this time, too, I’d already embraced trying to do at least 30 minutes a day, or maybe even four times a week getting 30 minutes in of learning new things. I had all these ideas and all these things that I was going through and learning in the process, but I could only talk about them. I’m on a whiteboard or from a resume perspective, but I didn’t really, couldn’t really show. Couldn’t let it stand on its own. That’s where I started really looking into blogging. I was like, “Man, maybe I should start a blog.” Start a blog, didn’t really know what I was going to do with it. If you go back and look at some of my early posts, it was like, “You know, I’m doing this, and I’m starting a business!” It really wasn’t a business, it was just me writing. As I started writing, I started talking about some of the things I’ve learned. I would go through and look, and be able to create articles around something I’ve learned, maybe even create some test projects.

A lot of that, they weren’t very good when I started. It can be an opinion thing if they’re good now, but I definitely know that I’ve improved, and I feel like that, but I think it’s something that really helped me and really focused me, too. Like I said, I was a web developer. You’ve all heard my story before, about when I became a data engineer, and jumped into the Hadoop area. I had that platform, and I had already been practicing doing some of the blogging and stuff like that. It was really easy for me, as I was going through, and learning, and learning things that other people wanted to see, to be able to start writing pig Latin tutorials. Hive, and what I’m doing with H base [Phonetic 00:04:40] and HTFS, and just general tips of things that I learned. It was like, strengthening that muscle, and it really helped me just accelerate just in being a part of the community as well, too. That’s my journey. That’s one of the main reasons that I’m so big on it, is because I came from that area, where I didn’t have anything that I could point to and say, “Hey, look.” These are all the cool things that I’m doing.

That’s why I started a blog, but why do I think that you should? What should your story be? Your story, you’re still writing it. You should write it on a blog. I really think it’s something that’s help you build out your brand, and I think it’s always something good that shows, one, you’re interactive in the community. It keeps you honest and keeps you motivated, too. It’s late at night. I didn’t really want to have to record any videos. I wanted to put it off. I have an audience. I have a schedule, and I try to keep content coming out. This made me come out to the office, and make sure that I got on camera, and was able to create content here, too. The same thing with your blog. If you create a blog, say you create a schedule, and you’re like, hey. I mean, I’ve done this before. I’m going to publish once every month. When I was first starting out, you feel horrible when you don’t. I missed quite a few months. It took me a long time before I published every month. I just really wasn’t consistent. It’ll keep you honest about learning. It’ll keep you honest about creating content and being a part of that community, too. I really think that it’s good at any stage in your career, but especially if you’re watching this channel, and you’re trying to figure out, “Where do I get started? What are some things that I should be doing?” You’ve probably heard me say it a ton of times. Start creating something to be a part of the community. I’m not saying go out and… We’ll have a longer session about how to start blogging and how to find, how to create your own content. I’m not saying go out and borrow people’s content or anything like that and put it as your own. There’s a definite way that you can do a lot of different things. I’m going to end this video this time, but maybe this is, we’ll just call this part one. I definitely think we should dig into how to start that blog, some content ideas, but I think today just kick around the idea, just think about it, start churning, start kicking those around in your idea, and then we’ll talk, and follow up later on with some content ideas. I’ll show you how to set up on, I think, I used Dream Host, but there’s a ton of other places out there. It’s something simple that you can set up in 10 minutes, and if you’re using [Inaudible 00:07:18] you can start publishing some of your own content, having your own audience, heck, you can put it in the comment section here below, to build, and we can use our audience to help everybody push their content out there. We can all support each other as well, too.

That’s all I have for today. Like I said, I’m going to follow up. I really like this idea, here. If you have some comments, or you think it’s a bad idea to start a blog in 2019, which I don’t think it is, but I’d love to hear your opinion. All opinions are welcome, so, thanks again, and I will see you next time on Big Data Big Questions.

[Music]

Speaking Skills For Data Engineers

July 15, 2019 by Thomas Henson Leave a Comment

How Important Is Public Speaking For Data Engineers?

Brand new question on Big Data Big Questions is around public speaking in Data Engineering. I’ve often heard that public speaking is the universal number 1 fear for most people. So many people choose to avoid it for various reasons. While no where will you see public speaking called out in Data Engineering descriptions, I believe it’s a skill that worth investing in. Find out my thoughts on Speaking Skills for Data Engineers in the video below.

Transcript – Speaking Skills For Data Engineers

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question comes in from a user. If you have a question, find me on Twitter or put it in the comment section here below. Send me an email. There’s a ton of different ways to get in touch and have your question answered on the next episode of Big Data Big Questions.

Today’s comes in from Bobby, and it says, “Can you let me know which career path is better between data scientist or data engineers, which we’ve talked about, but this one is for a person suffering with anxiety or difficulty giving presentations?” So, thank you for your question, Bobby, and I totally understand where you’re coming from as far as having challenges that you’re trying to deal with. Trying to pick out a career path, like, we want to play to things that we’re going to be successful at and things that we’re going to be able to excel in. You’re looking for that career path. I’ll say just right off the bat, a couple things stuck out to me about it. I’m going to get to those as we talk about why I think presentations and stepping outside your comfort level are some options for you. Let’s answer your question first, before we dive into Thomas’s thoughts on some of that.

Depending on which was you want to go, it’s not going to matter. It’s going to be more about if you’re more technical as far as wanting to be code, and hands-on, and building out clusters. Maybe starting to play with Kubernetes, Linux, those types of systems. Then, being on the data engineer side, it’s going to be a good way to go, or if you’re more math-based and want to get into the specifics of, hey, some of these features or some of these pieces of data may be able to give us better insight into what we’re trying to solve, then the data science path is going to be there. Don’t let your anxiety or your difficulty giving presentations say that, “I must go data engineer,” or must go data science, because I think they’re both equal to give you the opportunity to not have to present and not have to have as much interaction as you would maybe in a different role where it’s more customer-facing and job-driven.

My thought process about how much you’re going to have to deal with in that situation is, I’ve worked with people who never had to present. When we were in that role, that just wasn’t their thing. They may be at the meetings. They’ll be at the meetings, but they’re not the point person. Maybe get a one-off question or something like that, but most of that’s in the confines of their team. You’re still going to have team interaction, but there’s still a ton of downtime where it’s like, “Hey, headphones on,” just banging out your own code, or doing your deployments, and stuff like that. There’s not a ton of interaction there. You may have some user interaction that you’re working through, depending on where you are in the stage of your project, but for the most part, I don’t think even outside of the questions here, most of your customer interactions, a lot of times, maybe not so much on the data science side, but it’s going to be nothing like you would think from a web development perspective or front end developer. Still engaging with the users, but more on the team atmosphere. Feel free to choose any of those paths to be able to deal with your anxiety and difficulty giving those presentations. I think you’ll be totally fine, and I think you can get away with never having to give a presentation, if that’s in your vote.

But, I think you should. I think you should try to work towards conquering those difficulties and those presentations, and I’m not saying that you start off going out, and being like, “You know what? I’m going to try to go to a conference and give a keynote.” I’m going to try to go to a conference and give a breakout session. That’s not what I’m saying. I think you should start a little bit smaller, and just on your team, and then if you find a new feature or new software tool, or just a new process that you like doing, present that to your team. I know it’s tough, and I know it’s hard, because they even did a study a while back about the number one fear most people said that they fear public speaking more than they fear death.

Let me say that again. They feared death less than they feared public speaking. Most people would rather die than do public speaking. Definitely it’s something that I’ve been working on for quite a few years, and I’ll be honest, I get nervous each time. I get nervous, start talking to people. I’m like, “Oh, I’m about to go on.” It doesn’t matter. It doesn’t matter to the fact that, maybe I’ve given a certain presentation 25 times.

Heck, every time I turn the camera on, and there’s nobody in this room, here, on Big Data Big Questions, I still get nervous, too. There’s going to be some amount of nervousness, and I understand that, there’s varying levels, too. I’m not looking over and saying that, “Hey, you know, everybody, you know, everybody can be able to do that.” I do think that you can work towards it, and so maybe everybody’s not going to be able to do it on the same is maybe what I mean to say. I think it’s something you should try to, because presenting is going to open up doors for your career. It’s going to make you feel good, too. Each time I talked about how nervous I was, I just spoke in front of over 1,000 people for the first time in my life. That was huge, but I didn’t start out that way all in one day. I’ll tell you, I was super nervous, and it was just for a short amount of time, but I was nervous the whole time leading up to it, and then afterwards, after you get it, it’s like, yes! You get that amazing feeling that you’ve done something. I don’t know if you’re into sports or something like that, but you feel like you’ve won. Even though, who knows, it’s the first time speaking to that many people. I’ll probably hopefully have that opportunity again, and I’ll be better at it next time. It probably wasn’t my best time, if you’re looking at it.

It’s something that you start to work towards. It’ll be interesting, how much networking, and how many doors are open by doing that, and it’s all about giving back to the community as well. To recap, I don’t think that you have to choose data science or choose data engineer to be able to not have to present and do some of the other things. However, I think most people, and if you’re watching this channel, and you’re really curious about career development, I do think that everybody should have some kind of presentation skills, and this is something they should practice towards, and I totally understand. There’s a lot of anxiety whenever you’re doing something like that. If it’s something that you can work towards, and you can conquer, then I think it’s going to be something that’s going to be amazing. One, for the community, because we need more voices. And then two, it’s going to be something that you’re going to be proud of, and you’re going to be able to work on, and it’s just another challenge, too.

That’s all I have today for Big Data Big Questions. Make sure that you hit the subscribe and ring that bell, so you never miss another episode of [Whispers] Big Data Big Questions.

Will AI Replace Data Scientist?

July 12, 2019 by Thomas Henson Leave a Comment

Will AI Take My Job?

Artificial Intelligence is disrupting many different industries from transportation to healthcare. With any disruptions fear begins to pop around how that will impact me! One question poised on Big Data Big Questions was if “AI Will Replace Data Scientist”. We are truly in the early days of AI and Deep Learning but let’s look forward to see if AI will be able to replace Data Scientist. Find out my thoughts on AI Replacing Data Scientist by watching the video below.

Transcript – Will AI Replace Data Scientist?

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question comes in from a viewer. If you have a question, put it in the comment section here below, or you can reach out to me on thomashenson.com /big-questions. I’ll do my best to answer your question. This one came in around, “Hey, you know, is software going to replace data science?”

Whenever I think about software, specifically we’re probably talking about artificial intelligence. Artificial intelligence, or machine learning, or deep learning, or any of those models, are we going to be able to build models that can replace the data scientist?

This is a common theme, if you go out and Google anything right now, you can see, “Will AI replace lawyers?” Will AI replace doctors? All kinds of different things. Unequivocally, I think the short answer is no, but I’m going to talk about what I think are some of the reasons that I don’t think that AI is going to replace data scientists. Also, at the end, I’m going to give you some industry experts on what they think and what they’ve said about that whole concept.

Let’s jump in. Let’s talk a little bit about what a data scientist is, and then, talk about how we would even begin to look at how AI would replace that. Remember before, when we talked about data scientists in the past. These are the types of people that are trying to work on finding data that can build a model that might be able to predict an outcome. If we can predict the outcome, then maybe we can do something prescriptive. Hey, this is what’s going to happen, so let’s do this portion here after something happens. Think of if you’re creating, building a model to detect insider threats. You want to be able to decide, “Okay, does this user, maybe they’re potentially an insider threat.” Once you’ve identified that, maybe you can drop their access. Be prescriptive [Inaudible 00:02:04] it. Drop access that they have to certain directories, certain folders, and then also alert security.

We’re wanting to be able to build applications or models like that, that can be able to help. Can artificial intelligence do all that, kind of take the data scientist out? I don’t believe that’s the case. That’s very, very hard. If we really look at AI, and what’s going on right now, any time you hear the word AI, replace that with automation, and you’re like, “Okay, now I understand what’s going on.” Really, we’re not at the point where we’re actually building these super intelligent systems, kind of like what you see in Hollywood. I’m going to give you three different reasons around why I think that AI is not going to replace or software is not going to replace data scientists.

The first thing is, when we think about it, artificial intelligence has been around for quite some time. The term has, we’re getting better with our models. If you listen and read some of the books that I’ve read, we’re in that implementation phase where we’re putting these things out there. If you really look at it, even in the past, when we talk about the world’s best chess player versus artificial intelligence, we got to a point in the late ’90s where the world’s best chess player could win, or I’m sorry, the machine would beat the world’s best chess player. However, if you took a medium machine or artificial intelligence that was pretty good at chess, you paired it with a pretty good or an advanced human chess player, they could beat the world’s best machine learning model, or deep learning, or AI chess player. Same thing. What we’re doing, I think, the tools and the skills that you’re seeing being implemented for data scientists are about how we can help, right? What are the types of tools that can help us identify quickly maybe some complex algorithms that would work. Should I use a Generative Adversarial Network here? Should I used a convolutional neural network, or different types of things there?

Same thing that we’re seeing in the medical industry. Doctors aren’t going to be taken out of the loop, but doctors are going to be given maybe a voice assistant that you can prescribe and give the different, these are some of the symptoms that we’re seeing. What are some of the latest journal articles, and giving a summary to that, versus your data scientist or your medical, somebody in the medial field, they’re having to go out, and there’s always research, and research papers that they could be reading, and could be intaking, same thing here. You’re going to have assistants as a data scientist, to be able to say, “Look, what are…?” Run some stats on this, and let’s see what models might be good indicators here. I’m still in the loop. I’m still deciding what we’re going to do from that model, but it’s going to help me streamline and get faster, what we’re doing.

Number two, really simple, just go out there and look at the talent gap. We’re still looking for data scientists. That’s, go do a Google search, and you’re finding that there’s a ton of different open job applicants. If you go to any kind of symposium. There was a symposium over at Georgia Tech. One of the people from Google there was talking, and they were like, “Hey, man, I will take every PhD or even Master’s level candidate you have around data science and statistics,” and everything like that. There’s still a huge, huge talent gap there, and I don’t think it’s going to be cured by AI. Like I said, I think it’s going to be about automating, and then maybe AI can help us to train better humans that can fill those roles, but I think that’s another indication that, man, I don’t even know that we’re at our peak in data science. Just from a hype cycle perspective, either.

Number three, the industry experts. If you look at Andrew Neen, you look at Kai [Inaudible 00:05:41], you look at what their predictions are, data science is in one of those quadrants where it’s like, “Hey. It’s not a simple task that can be repetitive.” You’ve all seen the videos where it’s like, hey, robots, and AI can help on assembly lines. It’s a controlled environment. Data science is not controlled. It’s out there. It’s in the wild, and you’re having to, “This model,” or even ETL. We can’t even fix ETL. We’re still having to rely on human beings to help and automate, and make sure that we’re curating the right data sets, too. We’re still not at that point, and even if we do get to that point from an ETL perspective, still going to have to have data scientists. No, AI will not replace data scientists in the near future. All that’s subject to change. There could be advances in technology in 10 years that I don’t foresee. I’m not a futurist yet. Maybe, I don’t l know. I don’t have enough education, I guess, or understanding to be that. If you have any questions, put them in the comment section below. Make sure you subscribe, so that you never miss an episode of Big Data Big Questions. Ring that bell. Until next time, see you again. Big Data Big Questions.

Generating OneFS Software Keys

July 11, 2019 by Thomas Henson Leave a Comment