Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

O’Reilly AI Conference London 2019

October 9, 2019 by Thomas Henson Leave a Comment

The Big Data Big Data Questions show is heading to London for the O’Reilly AI Conference October 15 – 17 2019. I’m excited to be a part of the O’Reilly AI Conference series. In fact, this will be my third O’Reilly AI conference in the past year. Let’s look back at those events and forward to London.

San Jose & New York

 

View this post on Instagram

 

Late night packing my conference gear for my trip to O’Reilly AI Conference this week. Most important items: 1️⃣ Stickers 2️⃣ 🎧 3️⃣ 💻 4️⃣ Bandages? (I’ll explain later) 5️⃣ 📚 (this weeks its my Neural Networking) What’s your list of must have gear for tech conferences? #programming #coding #AI #conference #techconference

A post shared by Thomas Henson (@thomas_henson) on Sep 5, 2018 at 5:09am PDT


First in 2018 I attended the San Jose conference where I spent a good portion of the time in the Dell EMC booth talking with Data Engineers and Data Scientist. One of the major themes I heard from Data professionals was they were attending to learn how to incorporate Tensorflow into their workflows. In my opinion Tensorflow was talked about in every aspect of the conference. We had a blast learning from attendees and discussing how to Scale Deep Learning Workloads. Also this was my first time attending a conference with 14 stitches in my left hand (trouble on the pull up bar)!

Oreilly AI Conference

Next was O’Reilly AI New York. Forever this conference will be known in my head as the Sofia the Robot trip. During this conference I worked with Sofia the Robot not only at the conference but in a Dell EMC event at Time Square Studios (part of the Dell Technologies Magic of AI Series). Before the Magic of AI event, Sofia and I spent the day recording with O’Reilly TV about the current state of AI and what’s driving the widespread adoption. After a day of recording, I had a keynote for day two of the O’Reilly AI Conference where I discussed how AI is impacting future generations already. Then there was a whirlwind of activity as Sofia the Robot took questions at the Dell Technologies booth. The last thing of the day was the Magic of AI event in Time Square Studio where we had 100 people taking part in a questions and answer session with Sofia the Robot.

Keynote O’Reilly AI Conference New York

Coffee with Sofia the Robot

On To London

Next up is O’Reilly AI London. To say I’m excited is an understatement. During this trip I will accomplish many first time moments.

To begin with it’s my first international conference along with my first time in London. So many things to see and so little time to do it. Feel free to give me suggestions about visit locations in the comment section below. 

Second at O’Reilly AI London I will give my first breakout session at an O’Reilly Conference. While I’ve been on O’Reilly TV and given a keynote I’ve yet to have a breakout session.  My session is titled AI Growing Pains: Platform Considerations for Moving from POC to Large-Scale Deployments. The world is changing to innovate and incorporate Artificial Intelligence in many applications and services. However, with all this excitement many Data Engineers are still struggling with how to get projects past the Proof-of-Concept phase (POC) and into Production. Production environments present a list of challenges. The 3 biggest challenges I see when moving from POC to Production are the following:

  • The gravity of data is just as real as the gravity in the physical world. As Deep Learning workloads continue grow so does the amount of data stored to train these models. The data has gravity that will attract services and applications to the data. The trouble here making sure you have correct Data pipelines Strategy on place.
  • Once I had dinner with one of the Co-founders of Hortonworks, during which he said “Everything as Scale is exponentially harder. Have you ever moved around photos on your desktop? For the most part this is an easy task except when you accidentally move a large set of photos. Instantly after moving these large folders you are endlessly waiting for the hour glass to finish. Image doing this with 10 PBs of data. I think you get the picture here.
  • The talent pool today compared to early days of “Big Data” is much larger. However, the demand for skills in Deep Learning, Machine Learning, and Data Engineering is stressing the system. Which still leaves a skills gap for experienced engineers with Deep Learning and Machine Learning skills. The skills gap is one huge factor for why many projects get stuck in the POC phase instead into production.

If you would like to know more about moving projects from POC to Production make sure to checkout my session if you are attending O’Reilly AI Conference in London. AI Growing Pains: Platform Considerations for Moving from POC to Large-Scale Deployments @ 11:55 on October 16, 2019.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Data Engineers, Data Science Tagged With: AI, Conference, Data Engineers, Data Science, Deep Learning

Data Engineers: Data Science vs. Computer Science Degree

October 2, 2019 by Thomas Henson Leave a Comment

Data Science vs. Computer Science Degree

How Do you Choose the Right Degree?

College is such tough time when it comes to choosing education paths. For most folks College makes the first time they are making huge decision about their futures. So it’s easy to get analysis paralysis because the decision means so much. Or does it? At the end of the day it feels bigger than the decision really is over the long term.

The difference between a Data Science Degree and Computer Science degree might impact career outlook in the short term. The long term impacts of which degree you chose are minimal. Look around at the number of position where degrees aren’t even a requirement. When I was working on my first Big Data project our Data Scientist didn’t have a degree in Data Science but he was great in that role. Now I will say that Data Science degrees haven’t been around that long so it kinda of make sense.

Find out my thoughts of the differences between a Data Science Degree and Computer Science Degree in the video below.

Video Data Science vs. Computer Science Degree

Transcript – Data Science vs. Computer Science Degree

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question comes in from a user, and it’s all about, what specific Master’s degree should I get? Find out how I answer this question and what Master’s degree you should get or should not get if you’re going into data engineering.

Today’s question. If you have a question, make sure you put it in the comments section here below. Reach out to me on thomashenson.com/bigquestions. Find me on Twitter, whatever you want to do, and I’ll do my best to answer your questions right here.

Today’s question. I’m looking for a career as a data engineer, but I’ve got a Bachelor’s in IT, and I’m looking to get into a Master’s degree. Awesome! Congratulations. It’s a pretty cool thing to go through. I went through a Master’s program as well. Which is better for data engineering career? Thinking about that specifically. A Master’s degree in data science or a Master’s degree in computer science.

This question, for me, really keys. I remember what it was like going through, when I’m trying to figure out which kind of Master’s I wanted to go to. I had a similar situation. Specifically wasn’t in the data engineering from that perspective, but I was looking in, to see, what do I want to do to take the next level in my engineering career? I looked at an MBA with an emphasis in information systems versus a Master in Science Computer Science. I ended up choosing to continue on down the business path and getting my MBA in information systems. Pretty excited to have gone through that, and really happy with my decision. I feel like it’s been fortunate with my career. I understand where you’re coming from. I’m not telling you to get an MBA. That’s not what I’m saying. I understand how much you look, going back and forth, and you’re like, man! What do I do here? I appreciate you asking for my opinion, as well. Which one should you get if you’re going into a data engineering? It’s an easy guess for me, here, just to say, “I think computer science and the skills that are involved in computer science are going to help.” If I were in your shoes, I would look, and pivot more towards the computer science. I would look into, though, there are new universities and other programs that are starting to emerge that actually have a data engineering track. Just like you were asking about, should I do the data science? In my opinion, if you’re not trying to go down the data science path, you maybe don’t go into that. If they do have a tack specific for data engineers, so data science in a newer program, a lot of universities and colleges are having around the globe, so if they have a specific data engineering path, I’d look into that. Specifically, I’d probably stay with the computer science track. However, like I said, there are some universities that are putting out a specific, “This is not data science,” but a specific data engineering path, where you’re going to go through more systems administration stuff, where you’re going to be building out programs that are going to analyze data, and being able to really focus on distributed systems, whether it be from Kubernetes, and containers, to different clouds. No one had to do it in AWS. Building out good data pipelines and really understanding what you’re doing from that perspective. I think I’d look into that, and also make sure you’re looking at some of those degrees.

One more bonus tip around as you’re going through that. I would definitely, at the university that you’re looking at, have a conversation with some advisors, and even some of the professors in the data science world or in the computer engineering world, and see if you can cross over. Maybe there’s an opportunity there to do something inter-disciplinarian. Maybe you can take a couple of the data science courses, because they would be really good for you to get exposure to it, not become a data scientist, but exposure to what goes on, on the data science side, and have those packaged together, and go through some of those courses while you’re going through the computer science course. Maybe they, not asking you to take double load. Hopefully there’s a crossover there, where it’s like, “Hey, I can pick and choose some of these.” With data engineering and just the boom that’s going on with that as far as careers and, if you look at just globally, we need more data engineers. The universities will be pretty excited for, especially somebody standing out to do that. Worst case scenario, what are they going to do? Your professors may tell you no, but they see that you’re engaged, and that you’re interested in data engineering, so they’re going to be able to look out for, maybe there’s new classes that are coming up. What about internships, right? Some of these universities have really good relationships with corporations. Your name is already at the top of the list, and it’s shown that you’re showing initiative, that hey, I’m excited about the data engineering world, so any opportunities to learn more or any opportunities for future career growth, might be a good thing. Something as simple as taking an hour to reach out and talk to a professor may be investing in yourself and in your career for further on down the future. Definitely try that out. Should you get a Master’s degree to become a data engineer? You don’t have to, but like I said, I’ve got a Master’s degree, and I went through that for my own purposes. If you’re watching this video, you’ve made it all the way to the end, which I hope you’ve made it to the end. Everybody that starts watching it, this was a specific question where we were talking about different degree options for your career. We’re not saying that you have to get the Master in Computer Science to become a data engineer. Heck, you can even go through, you can do the Master in Data Science and become a data engineer. This is just my advice for what we’re trying to do. There are other data engineers that don’t have degrees. We’ve covered that quite a few bit on this channel, and so I just want to be specific to that. I don’t want people watching this course, especially if you’re in college, or if you’re in high school and you’re starting to think about your data engineering path, like, “Aw, man! I’ve got to go get a Master’s degree to do this. Be in it for the long haul. That’s not what we’re talking about here. We’re just talking about options. Let me know if you have any questions about degrees, certifications, anything data engineering or technology-specific, and I will answer it on the next episode of Big Data Big Questions.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Career Tagged With: Computer Science, Data Engineer, Data Science

Review Coursera’s Neural Networking & Deep Learning Course

July 17, 2019 by Thomas Henson Leave a Comment

 

Coursera's Neural Networking & Deep Learning Course

Another Machine Learning Course?

Yet another machine learning course has caught my attention here lately. Andrew Ng has a new course available on Coursera focused on Neural Networks and Deep Learning. How did I like the course and should you take the course? Find out my thoughts on Coursera’s Neural Network and Deep Learning course.

Transcript- Review Coursera’s Neural Networking & Deep Learning Course

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s questions comes in around a new course that I am taking, myself. It’s not a course that I’m writing. I’ve talked about some of my Pluralsight courses. This is actually a deep learning course that I’m taking with Coursera. It’s the second course that I’ve taken with Coursera. I did another one from Andrew Ng  called, I think, learning machine learning, and just went through that portion, and swore I’d never take another one, and here I am again. Find out my review on that course and how I’m doing on it here in just a second.

[Sound effects]

Today’s question is around what I’m doing from a course perspective. I’m taking a course called neural networks and deep learning. This is actually part one in a large certification series. If you go out to deeplearning.ai, it’s an Andrew Ng specific course. I did his machine learning course before, and went through it, and did some reviews with it on another channel with a group, the Big Data Beer Team. You can always check that out and find that.

I swore I’d never do another course, and here I am doing another one, because the math portion for me is a little more into the weeds than I like to be and really think, from a data engineering perspective, it probably is. Either way, my thing is to do this review and give you all the insights. You can decide if you want to take that course and find out where you are. I’m through part one. The neural networks and deep learning is part one in that course. It’s an Andrew Neen course, so he’s like, probably trained more people around machine learning and deep learning than anybody else on the planet. Worked at Badu, at Google, Stanford, and has his own company, own startup where he’s walking through driverless cars. Huge authoritative figure who’s teaching this course. It’s amazing from that aspect of it.

Little bit overwhelming, I’ll tell you. We’ll get into it a little bit, but each part of these courses are broken into, I think, four weeks. This first one was four weeks. We’re going to go through how I felt through each of the four weeks, and give you my thoughts on that.

In the first week, week one was intro to deep learning, and really it was about the why for deep learning. Why is deep learning? What’s the history of it? Is this anything new? Is this going to solve all our problems in the future? Eh, maybe.

Maybe we don’t get into that as much, but this was a pretty good one, and I actually did, with each one of these courses, there is a heroes in AI interview session. If you like watching YouTube videos like you do now, this is similar to that, but it’s behind the paywall, or behind the course wall there in Coursera. I actually went through that, when I did not get through all of them, but I did go through this one. It was pretty good. Can’t really remember who it was. Maybe shame on me for that. Should’ve put that in my notes.

Week one was pretty easy to step through and everything like that. There might’ve been a quiz or something, but no programming aspects from that perspective. Week number two, logistic regression in neural networks. Probably my least favorite portion of the course so far. A lot of math-based and somewhat of a review. Actually, when I got to this portion, I was like, “Man, this is…” I was going through the course material and watching the videos. I was like, “This is kind of a review from what I did in the machine learning course.”

I’m going to ace everything here, and I did ace the quiz. It wasn’t too hard, but when we stepped into the programming, it was a little more complicated than I thought, and I have some reasons why I think that is, and I’m going to talk about those here at the end. For the most part, week two was really just a level set. Hey, remember, this is the cross-function. This is how we use linear regression, and just walking through some of those portions, to be able to say hey, this is what’s going on behind the scenes.

If you’ve gone through like I have, and implemented networks, and played around with Tensorflow or TF Learn, you already know some of the things that are going on, which maybe you don’t understand it fully. This was a good review to start off to that perspective. If you haven’t taken the machine learning course, no problem. You can jump right into it. Like I said, he takes it from a high level here and gets you going.

Week three. My favorite week. We talked about shallow neural networks. This is the basics of how to build a neural network. What I like the most about this was, we deep dived into why non-linear functions and why we use different activation functions. It was really cool, because I actually taught a portion of this in my course, and just it was cool to see how Andrew was able to explain it. Maybe not a whole lot better than me. I don’t want to undersell myself, but it was definitely awesome to see his background, and his thought process, and just him saying, “Hey, this is why we use [Inaudible 00:05:19], these are some of the things that you’re going to see with it.” Don’t worry about it, because of these reasons. Really, my favorite portion of this course was week three, so around the shallow neural networks. Still went through and took a little bit longer to do the programming exercise than I thought would take me.

Little bit of stress there, but quizzes were good. It was easy if you follow along, and just take good notes, and you’ll be able to pass the quizzes. There’s a new thing that they’re trying out, too, called notes. I’ve started playing around with that. I’ll probably, in my next video, talk a little bit about that as I’m using it more and more, and maybe that’ll be a quick tip that you guys can use whenever you’re going through a course on Coursera.

Week four, not my favorite week. It was pretty good. We started getting into deep learning and deep neural networks and how those are working. Some of the things that we really did was talked about the matrix dimensions and how some of that works. Didn’t get into it as much as they will in future courses. It’s easy for me to look at it now and say that, because I jumped ahead a little bit. From the perspective of this course, neural networks and deep learning part one, really talks through some of the matrix portions and then starts building out your deep networks. Also, talks about parameters and hyperparameters. I was familiar with hyperparameters and parameters before, just with having been hands-on before, but it was really helpful to do those.

The quiz in this one, once again, if you paid attention, you went through it. You have to work through some math and do some other portions of it, but the quizzes are pretty simple. Make sure you’re using your own notes and everything for that. When it came to the programming exercises, I think there was two in this week, and they were somewhat difficult. I think the second one was pretty long as far as building out. You get to get hands-on with Tensorflow. Still a little bit more challenging, I guess, I think, and there’s some ways that we can make it a little bit better. Let me talk about that here just next.

Overall, I thought the course was all right. It was good for me, just some of it was a little bit of a review. Some of it went a lot deeper than I’ve dove in before, so I thought that portion was good for me. I will say, on all the programming exercises, they’re all graded. One of the things that I find challenging, and maybe it’s just the way that I learn, but I feel like they’re a little harder just because you go through, and it’s like you’re being tested day one. Whenever you’re going through the videos and everything, you’re doing everything from a math perspective on paper, or if you’re taking digital notes, but you’re not really doing any of the programming functions. If you don’t have a solid basics in programming, or it’s not something that you do every day from that perspective, I think it’s going to be a little more challenging. One of the things that could help out, I think, and broaden for the students that are coming in would be to have more coding examples that aren’t graded. It doesn’t have to be verbatim. Hey, this is really, really close to what the examples are. I get that you want to test, and you want to make it so that you’re applying what you’re learning.

Also, I think a few more coding examples where you can go through and see, “These are some of the steps.” If you understand the math portion of it, doesn’t necessarily mean that you’re going to be able to go in and be able to program it right there, and when we talk about it from a real-world perspective, whenever I look at it, yeah, you need to understand those things, and know how to implement those at a base level, but there’s so many. There’s so many other things it can do from a high level. For example, one of the biggest challenges I had going through this was, I build a whole course around TF Learn, and being able to use that abstraction layer over Tensorflow. For me, having to go through step by step, and showing how you can do this, where you can write it in TF Learn or use one of those functions, I think that would’ve been… That would be a different approach to take it, and I think that would broaden the audience, and make it a little more enjoyable, too.

If you’re having to go through, and you know that writing these 60 lines of code is something that you can write in 4, it makes it a little bit harder, especially since I already just did all the math portion, and kind of went through all those activations and everything work, versus having to go through some of the minutia on the programming. That’s just my two cents. If you’ve taken this course, please tell me. Tell me your opinion. You’re listening to mine. Let’s make this a conversation. I’d love to hear what some of your thoughts are, where you think I’m wrong if you think I should be better at math. You’re probably right. I think I’m getting the math. We’ll see.

Fair enough, my programming skills in Python, like I said, they’re all right. They’re not to the level here. I think that’s another gap that I found going through this course. All in all, I guess I would recommend it if you’re looking into using deep learning, but I don’t think that, if you’re a data engineer, that you have to go through anything like this. Like I said, it’s a good aspect of it, but there’s some other things and other skills that you probably want to get. If you’re more looking to the data science, or deep learning, or machine learning engineer, then going through something, one of these, this course would probably be pretty good. In the next video, check out, I jumped way too ahead in the next course. You might see. I jumped to, I think, the fifth portion or fourth portion when I was supposed to go to the second portion. I’ll talk about that in the next video. If you have any questions, make sure you put them in the comments section here below, or reach out to me on thomashenson.com/big-questions. Find me on Twitter or Instagram. Ask any questions. I’ll try my best to answer them. Make sure you subscribe so that you never miss an episode, and ring that bell. Thanks again.

Filed Under: Data Science Tagged With: Data Science, Deep Learning, Neural Networks

Will AI Replace Data Scientist?

July 12, 2019 by Thomas Henson Leave a Comment

WIll AI Replace Data Scientist

Will AI Take My Job?

Artificial Intelligence is disrupting many different industries from transportation to healthcare. With any disruptions fear begins to pop around how that will impact me! One question poised on Big Data Big Questions was if “AI Will Replace Data Scientist”. We are truly in the early days of AI and Deep Learning but let’s look forward to see if AI will be able to replace Data Scientist. Find out my thoughts on AI Replacing Data Scientist by watching the video below.

Transcript – Will AI Replace Data Scientist?

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question comes in from a viewer. If you have a question, put it in the comment section here below, or you can reach out to me on thomashenson.com /big-questions. I’ll do my best to answer your question. This one came in around, “Hey, you know, is software going to replace data science?”

Whenever I think about software, specifically we’re probably talking about artificial intelligence. Artificial intelligence, or machine learning, or deep learning, or any of those models, are we going to be able to build models that can replace the data scientist?

This is a common theme, if you go out and Google anything right now, you can see, “Will AI replace lawyers?” Will AI replace doctors? All kinds of different things. Unequivocally, I think the short answer is no, but I’m going to talk about what I think are some of the reasons that I don’t think that AI is going to replace data scientists. Also, at the end, I’m going to give you some industry experts on what they think and what they’ve said about that whole concept.

Let’s jump in. Let’s talk a little bit about what a data scientist is, and then, talk about how we would even begin to look at how AI would replace that. Remember before, when we talked about data scientists in the past. These are the types of people that are trying to work on finding data that can build a model that might be able to predict an outcome. If we can predict the outcome, then maybe we can do something prescriptive. Hey, this is what’s going to happen, so let’s do this portion here after something happens. Think of if you’re creating, building a model to detect insider threats. You want to be able to decide, “Okay, does this user, maybe they’re potentially an insider threat.” Once you’ve identified that, maybe you can drop their access. Be prescriptive [Inaudible 00:02:04] it. Drop access that they have to certain directories, certain folders, and then also alert security.

We’re wanting to be able to build applications or models like that, that can be able to help. Can artificial intelligence do all that, kind of take the data scientist out? I don’t believe that’s the case. That’s very, very hard. If we really look at AI, and what’s going on right now, any time you hear the word AI, replace that with automation, and you’re like, “Okay, now I understand what’s going on.” Really, we’re not at the point where we’re actually building these super intelligent systems, kind of like what you see in Hollywood. I’m going to give you three different reasons around why I think that AI is not going to replace or software is not going to replace data scientists.

The first thing is, when we think about it, artificial intelligence has been around for quite some time. The term has, we’re getting better with our models. If you listen and read some of the books that I’ve read, we’re in that implementation phase where we’re putting these things out there. If you really look at it, even in the past, when we talk about the world’s best chess player versus artificial intelligence, we got to a point in the late ’90s where the world’s best chess player could win, or I’m sorry, the machine would beat the world’s best chess player. However, if you took a medium machine or artificial intelligence that was pretty good at chess, you paired it with a pretty good or an advanced human chess player, they could beat the world’s best machine learning model, or deep learning, or AI chess player. Same thing. What we’re doing, I think, the tools and the skills that you’re seeing being implemented for data scientists are about how we can help, right? What are the types of tools that can help us identify quickly maybe some complex algorithms that would work. Should I use a Generative Adversarial Network here? Should I used a convolutional neural network, or different types of things there?

Same thing that we’re seeing in the medical industry. Doctors aren’t going to be taken out of the loop, but doctors are going to be given maybe a voice assistant that you can prescribe and give the different, these are some of the symptoms that we’re seeing. What are some of the latest journal articles, and giving a summary to that, versus your data scientist or your medical, somebody in the medial field, they’re having to go out, and there’s always research, and research papers that they could be reading, and could be intaking, same thing here. You’re going to have assistants as a data scientist, to be able to say, “Look, what are…?” Run some stats on this, and let’s see what models might be good indicators here. I’m still in the loop. I’m still deciding what we’re going to do from that model, but it’s going to help me streamline and get faster, what we’re doing.

Number two, really simple, just go out there and look at the talent gap. We’re still looking for data scientists. That’s, go do a Google search, and you’re finding that there’s a ton of different open job applicants. If you go to any kind of symposium. There was a symposium over at Georgia Tech. One of the people from Google there was talking, and they were like, “Hey, man, I will take every PhD or even Master’s level candidate you have around data science and statistics,” and everything like that. There’s still a huge, huge talent gap there, and I don’t think it’s going to be cured by AI. Like I said, I think it’s going to be about automating, and then maybe AI can help us to train better humans that can fill those roles, but I think that’s another indication that, man, I don’t even know that we’re at our peak in data science. Just from a hype cycle perspective, either.

Number three, the industry experts. If you look at Andrew Neen, you look at Kai [Inaudible 00:05:41], you look at what their predictions are, data science is in one of those quadrants where it’s like, “Hey. It’s not a simple task that can be repetitive.” You’ve all seen the videos where it’s like, hey, robots, and AI can help on assembly lines. It’s a controlled environment. Data science is not controlled. It’s out there. It’s in the wild, and you’re having to, “This model,” or even ETL. We can’t even fix ETL. We’re still having to rely on human beings to help and automate, and make sure that we’re curating the right data sets, too. We’re still not at that point, and even if we do get to that point from an ETL perspective, still going to have to have data scientists. No, AI will not replace data scientists in the near future. All that’s subject to change. There could be advances in technology in 10 years that I don’t foresee. I’m not a futurist yet. Maybe, I don’t l know. I don’t have enough education, I guess, or understanding to be that. If you have any questions, put them in the comment section below. Make sure you subscribe, so that you never miss an episode of Big Data Big Questions. Ring that bell. Until next time, see you again. Big Data Big Questions.

Filed Under: Career Tagged With: AI, Data Science, Deep Learning

Do Data Scientist Code?

June 6, 2019 by Thomas Henson Leave a Comment

Scientist Who Code

Data Science is a hot career field in Data Analytics. On data teams with Data Engineers how much coding is expected from the Data Scientist?

The role of the Data Scientist includes find features or correlations in data that might predict outcomes. Those prediction then become data models that are tested multiple times. After those data models reach a high confidence level they are then automated in applications. In this episode of Big Data Big Questions let’s find out just how much coding a Data Scientist can be expected to do.

Transcript – Do Data Scientist Code?

Hi folks. Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today, I’m still here in the gym answering your questions, recording between mail trucks coming by. Crazy.

Today’s question comes in from a viewer. Thanks for watching. If you have any questions, make sure you put them in the comments section here below, and also subscribe, make sure you hit that bell so you get notifications. Today’s question comes in around data scientists, and it’s specifically, “Do data scientists code?” We talked about the roles of a data engineer. We’ve talked about the roles of a machine learning engineer and even data scientist, but where is that fine line between how much data scientists code? We’re going to talk about that, and talk about some of the tools that they use, and then try not to day, “Depends.”

I know, I know. You’re like, “Man. He’s going to say, ‘depends.'” But, I’m going to try not to say that. Does a data scientist code? The answer is yes. Data scientists, for the most part, they’re able to code. The tools that they use, how much are they coding, that’s really going to be dependent on — didn’t say depends, dependent — the role that they’re in. If they have a data engineer or a machine learning engineer, that can help them put their code in production and finalize some of the things that they’re doing.

I will say, I’ve worked on a team before where, from a data scientist perspective, they were primarily using MATLAB. They were using MATLAB or Excel, and then when we pushed it to Hadoop, we’ve talked. You’ve heard me talk about it many times on here, that little bit harder, because once we’re doing that in a solo environment on their machine, and then we went to distributed algorithms, at the time we were using Mahout [Phonetic 00:01:45]. Going from Excel or MATLAB to Mahout, we really had to change and tune a lot of different things. That’s where my expertise as a data engineer and developer was able to help out and keep running the same job a hundred times.

Things have changed with TensorFlow and a lot of other tools since that time. Yes, the data scientists will go. It’s going to depend on what they’re going to use. Some of the common tools, like I said, MATLAB, even Excel, dependent on what you’re working on. If it’s a big data project, might not be using some of the other larger tools. Then, you also have R. You have Python, which we’ve talked about and we love here. We also have Scala or skuh-la. I always say it wrong. There are many different tools that data scientists use whenever they’re coding. That doesn’t mean that they’re doing all the coding. This is back to the dividing lines of the roles and where they are. Whenever we’re talking about standing up the environment, pushing things out to production, and even doing some of the heavier lifting, like I will say operationalizing of the code. Getting it ready, past the trendy phase on some of the other pieces that you’re really bringing it in to, “Hey, it’s going to support this dashboard.” It’s going to do this piece, or even some of the ETL jobs, still going to come to us as the machine learning engineers or the data engineers, just depending on the role. Still not saying depends. It’s depending on that role. For the most part, to answer the question simply, yes. Your data scientist is going to code. How much of that code is really in-depth, if you think about, are they doing the job of coding? I would say no. That would just be what I’ve seen. If you think about some of the reasons that we have the tools that we have now, like why was PIG created? Why do we have a Python API in Spark, when Spark, we could do Spark in Java, right?

Why do we have a Scala API in Spark? It’s because of the fact that we want to use a higher-level language, so that data analysts, data scientists, they can run their code, and they can do it without having to worry about Java and some of the other components there. Yes, data scientists code. How much do they code? It’s going to depend on their partner machine learning or data engineer. Your data scientist is not going to replace the data engineer or machine learning engineer. We’re all on the same team, here. We’re not trying to compete back and forth, and if I had to choose a side, I’d say machine learning engineers and data engineers. But, I’m very biased.

That’s all for today. Hope you enjoyed this episode. If you have any questions, throw them in the comments section here below, and make sure you hit subscribe, and ring that bell, and I will see you again on the next episode of Big Data Big Questions.

Filed Under: Data Science Tagged With: Code, Data, Data Science

Tableau For Data Science?

May 15, 2019 by Thomas Henson Leave a Comment

Big Data Big Questions

Tableau is huge for interacting with data and empower users to find insight in their data. So does this mean Tableau is the primary tool for Data Scientist? In this episode of Big Data Big Questions we tackle the question of “Is Tableau used for Data Science”.

Tableau For Data Science

What is Tableau

Tableau is a business intelligence software that allows for users to visualize and drill down into data. Data Users leverage Tableau highly for visualization portion of Data Science projects. The sources for data can be from databases, CSVs, or almost any source with structured data. So if Tableau is for analyzing and visualizing data is it a tool specific Data Scientist? Watch the video below to find out Tableau’s role in the world of Data Science.

Transcript – Tableau For Data Science?

Hi folks! Thomas Henson here with thomashenson.com, and today is another episode of Big Data Big Questions. Today’s question comes in from a user, and it’s around data science, and Tableau, and how those go together. But, before we jump into the question, if you have a question that you want to know about data engineering, IT, data science, anything related to IT, or just want to throw a question at me, put it in the comments section here below or reach out to me on Twitter at #BigDataBigQuestions. Or, thomashenson.com/big-questions. Ton of ways to get your questions here, answered right on this show, all you have to do is type away and ask.

Now, let’s jump into today’s question. Today’s question comes in from a YouTube viewer, and it’s about, hey, in data science, do you use Tableau? You can see the question here as it pertains to this, and so this is a question we started up this show doing, around data engineering, but now we’re really jumping towards, hey, what’s going on from a data science and just encompassing all of it? Today’s question, we’re going to talk about where’s Tableau used, right? A lot of people use Tableau. It’s really, really popular. But, is that really a tool that a data scientist is going to use? Should you invest your time as a data engineer or a data scientist aspiring or not aspiring to get into data science? Should you spend time learning about that tool?

My thoughts on Tableau are that it’s really good for giving information out to users that could be not necessarily data scientists. They could be users of it. They could be analysts. They could be somebody who just has a stake in their business. I’ve used it at a lot of different corporations that I’ve worked at, and companies, and companies, and organizations, and really what I see is those tools are more for the end user, for visualization. They may fall more in the data visualization bucket. We’ve talked about the three tiers of work. You have your data scientist, you have your data engineer, and your data visualization specialist, the person who’s making sure that, hey, at the end of the day, it’s great that we have all these algorithms that are showing us and being able to predict whatever we’re trying to look at in our data, but if we can’t sell that and can’t convey that to the people that need the data to make a decision on, then it’s just an experiment, it’s just us having fun doing research.

When it comes to an end product or being able to really sell your point, data visualization, I think that’s the bucket that Tableau fits in more than just traditional data science. Could be wrong. Let me know if I am here in the comments section below, but let me talk a little bit about my use case and where I’ve seen it. Like I said, I’ve used it in a lot of different organizations that I’ve worked with or even contracted with. One of the main use cases, I’ll give you an example. Let’s say that you’re a YouTube viewer. I’m not saying YouTube uses Tableau, this is just an example. I don’t want to give away too much information, insider. If you have a YouTube channel, think about if you want to see the videos that are coming in. You’re a user. You’re a publisher, a creator. You want to know. Here is all the videos that you have. Here’s how long they’re watched. Here’s all the demographics from behind the scenes that you can pull. Maybe the times that they were watched. How long they were watched, so on this video here, if people drop out after 30 seconds, I did something wrong there. Versus, how many people go through the end of it. Same thing, too. What you would do is, you would have all this information and aggregate all this data, and you maybe even pull some insights. Like, hey, what’s your average? We can do some real simple things, or you can do some complex things, too. Tableau is where you’re going to give the end user the access.

At least what I’ve seen a lot. There’s a big need to be able to do that and be able to pull that data. It gives you a way to, I wouldn’t say that a data scientist wouldn’t, per se, use that as their tool. It wouldn’t be their only tool. Maybe that’s the way that they aggregate and look at large amounts of data before they go in and start to pick and choose. I’m sure there’s some modules out there that are incorporating machine learning and deep learning. I will say, if you’re really looking from an AI perspective to jump into, it’s not just going to be about Tableau. I’m not saying that you shouldn’t get up to speed on Tableau, but I wouldn’t say that, hey, I’m a brand-new person graduating high school, graduating college, or somebody that sees it in their career and looking to go into data science, my choice would not be to jump in and learn Tableau. I would start learning a little bit more about Python, and algorithms, and maybe R, or some of the other higher-level languages to talk around machine learning and deep learning, versus saying, “Hey, this is the tool that’s going to kind of take me there.” Now, if you’re a data visualization person, or you want to get into big data from that perspective, there’s a lot of things that you can use Tableau to do. You might add it to your bucket. As far as we talk about on this show, how to accelerate your career or how to break into the big data realm, this is not one of those tools that I’m going to say, hey, this is the only choice you have. Not really going to be the one that’s probably going to make the more sense. It’s not going to be the game changer, like hey, this person’s certified in Tableau or is a Tableau wizard. If you’re applying for a job that’s all around Tableau then, definitely. As far as, I really want to get down into data science, and I really want to get deep in it, Tableau’s one of those things. Definitely probably going to use or come across tools that are similar to that, but it’s not going to be your mainstay, probably, where you’re writing your algorithms and doing your analytics.

That’s all for today. If you have any questions, make sure you put them in the comments section here below, and then make sure you click subscribe to follow this channel, so that you never miss an episode of Big Data Big Questions.

[Music]

Filed Under: Data Engineers, Video Tagged With: Big Data Big Questions, Data, Data Science

Big Data Skills For Data Scientist

February 11, 2018 by Thomas Henson 1 Comment

Big Data Skills for Data Science

What Big Data Skills should data scientist understand to be able to take advantage of the Hadoop Ecosystem? Today’s episode of Big Data Big Questions we tackle the tools and frameworks that Data Scientist should know in order to work in Big Data. Also we’ll break down how much of Spark, Hadoop, Mahout, MADLIB, and other tools Data Scientist need to understand. Lastly I’ll give tips for Data Engineers that want to begin to move toward the Data Scientist role.

Find out about Big Data Skills for Data Science by watching the video below.

Video – Big Data Skills for Data Science

Filed Under: Big Data Tagged With: Data Science

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...