Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

O’Reilly AI Conference London 2019

October 9, 2019 by Thomas Henson Leave a Comment

The Big Data Big Data Questions show is heading to London for the O’Reilly AI Conference October 15 – 17 2019. I’m excited to be a part of the O’Reilly AI Conference series. In fact, this will be my third O’Reilly AI conference in the past year. Let’s look back at those events and forward to London.

San Jose & New York

 

View this post on Instagram

 

Late night packing my conference gear for my trip to O’Reilly AI Conference this week. Most important items: 1️⃣ Stickers 2️⃣ 🎧 3️⃣ 💻 4️⃣ Bandages? (I’ll explain later) 5️⃣ 📚 (this weeks its my Neural Networking) What’s your list of must have gear for tech conferences? #programming #coding #AI #conference #techconference

A post shared by Thomas Henson (@thomas_henson) on Sep 5, 2018 at 5:09am PDT


First in 2018 I attended the San Jose conference where I spent a good portion of the time in the Dell EMC booth talking with Data Engineers and Data Scientist. One of the major themes I heard from Data professionals was they were attending to learn how to incorporate Tensorflow into their workflows. In my opinion Tensorflow was talked about in every aspect of the conference. We had a blast learning from attendees and discussing how to Scale Deep Learning Workloads. Also this was my first time attending a conference with 14 stitches in my left hand (trouble on the pull up bar)!

Oreilly AI Conference

Next was O’Reilly AI New York. Forever this conference will be known in my head as the Sofia the Robot trip. During this conference I worked with Sofia the Robot not only at the conference but in a Dell EMC event at Time Square Studios (part of the Dell Technologies Magic of AI Series). Before the Magic of AI event, Sofia and I spent the day recording with O’Reilly TV about the current state of AI and what’s driving the widespread adoption. After a day of recording, I had a keynote for day two of the O’Reilly AI Conference where I discussed how AI is impacting future generations already. Then there was a whirlwind of activity as Sofia the Robot took questions at the Dell Technologies booth. The last thing of the day was the Magic of AI event in Time Square Studio where we had 100 people taking part in a questions and answer session with Sofia the Robot.

Keynote O’Reilly AI Conference New York

Coffee with Sofia the Robot

On To London

Next up is O’Reilly AI London. To say I’m excited is an understatement. During this trip I will accomplish many first time moments.

To begin with it’s my first international conference along with my first time in London. So many things to see and so little time to do it. Feel free to give me suggestions about visit locations in the comment section below. 

Second at O’Reilly AI London I will give my first breakout session at an O’Reilly Conference. While I’ve been on O’Reilly TV and given a keynote I’ve yet to have a breakout session.  My session is titled AI Growing Pains: Platform Considerations for Moving from POC to Large-Scale Deployments. The world is changing to innovate and incorporate Artificial Intelligence in many applications and services. However, with all this excitement many Data Engineers are still struggling with how to get projects past the Proof-of-Concept phase (POC) and into Production. Production environments present a list of challenges. The 3 biggest challenges I see when moving from POC to Production are the following:

  • The gravity of data is just as real as the gravity in the physical world. As Deep Learning workloads continue grow so does the amount of data stored to train these models. The data has gravity that will attract services and applications to the data. The trouble here making sure you have correct Data pipelines Strategy on place.
  • Once I had dinner with one of the Co-founders of Hortonworks, during which he said “Everything as Scale is exponentially harder. Have you ever moved around photos on your desktop? For the most part this is an easy task except when you accidentally move a large set of photos. Instantly after moving these large folders you are endlessly waiting for the hour glass to finish. Image doing this with 10 PBs of data. I think you get the picture here.
  • The talent pool today compared to early days of “Big Data” is much larger. However, the demand for skills in Deep Learning, Machine Learning, and Data Engineering is stressing the system. Which still leaves a skills gap for experienced engineers with Deep Learning and Machine Learning skills. The skills gap is one huge factor for why many projects get stuck in the POC phase instead into production.

If you would like to know more about moving projects from POC to Production make sure to checkout my session if you are attending O’Reilly AI Conference in London. AI Growing Pains: Platform Considerations for Moving from POC to Large-Scale Deployments @ 11:55 on October 16, 2019.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Data Engineers, Data Science Tagged With: AI, Conference, Data Engineers, Data Science, Deep Learning

Review Coursera’s Neural Networking & Deep Learning Course

July 17, 2019 by Thomas Henson Leave a Comment

 

Coursera's Neural Networking & Deep Learning Course

Another Machine Learning Course?

Yet another machine learning course has caught my attention here lately. Andrew Ng has a new course available on Coursera focused on Neural Networks and Deep Learning. How did I like the course and should you take the course? Find out my thoughts on Coursera’s Neural Network and Deep Learning course.

Transcript- Review Coursera’s Neural Networking & Deep Learning Course

Hi folks! Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s questions comes in around a new course that I am taking, myself. It’s not a course that I’m writing. I’ve talked about some of my Pluralsight courses. This is actually a deep learning course that I’m taking with Coursera. It’s the second course that I’ve taken with Coursera. I did another one from Andrew Ng  called, I think, learning machine learning, and just went through that portion, and swore I’d never take another one, and here I am again. Find out my review on that course and how I’m doing on it here in just a second.

[Sound effects]

Today’s question is around what I’m doing from a course perspective. I’m taking a course called neural networks and deep learning. This is actually part one in a large certification series. If you go out to deeplearning.ai, it’s an Andrew Ng specific course. I did his machine learning course before, and went through it, and did some reviews with it on another channel with a group, the Big Data Beer Team. You can always check that out and find that.

I swore I’d never do another course, and here I am doing another one, because the math portion for me is a little more into the weeds than I like to be and really think, from a data engineering perspective, it probably is. Either way, my thing is to do this review and give you all the insights. You can decide if you want to take that course and find out where you are. I’m through part one. The neural networks and deep learning is part one in that course. It’s an Andrew Neen course, so he’s like, probably trained more people around machine learning and deep learning than anybody else on the planet. Worked at Badu, at Google, Stanford, and has his own company, own startup where he’s walking through driverless cars. Huge authoritative figure who’s teaching this course. It’s amazing from that aspect of it.

Little bit overwhelming, I’ll tell you. We’ll get into it a little bit, but each part of these courses are broken into, I think, four weeks. This first one was four weeks. We’re going to go through how I felt through each of the four weeks, and give you my thoughts on that.

In the first week, week one was intro to deep learning, and really it was about the why for deep learning. Why is deep learning? What’s the history of it? Is this anything new? Is this going to solve all our problems in the future? Eh, maybe.

Maybe we don’t get into that as much, but this was a pretty good one, and I actually did, with each one of these courses, there is a heroes in AI interview session. If you like watching YouTube videos like you do now, this is similar to that, but it’s behind the paywall, or behind the course wall there in Coursera. I actually went through that, when I did not get through all of them, but I did go through this one. It was pretty good. Can’t really remember who it was. Maybe shame on me for that. Should’ve put that in my notes.

Week one was pretty easy to step through and everything like that. There might’ve been a quiz or something, but no programming aspects from that perspective. Week number two, logistic regression in neural networks. Probably my least favorite portion of the course so far. A lot of math-based and somewhat of a review. Actually, when I got to this portion, I was like, “Man, this is…” I was going through the course material and watching the videos. I was like, “This is kind of a review from what I did in the machine learning course.”

I’m going to ace everything here, and I did ace the quiz. It wasn’t too hard, but when we stepped into the programming, it was a little more complicated than I thought, and I have some reasons why I think that is, and I’m going to talk about those here at the end. For the most part, week two was really just a level set. Hey, remember, this is the cross-function. This is how we use linear regression, and just walking through some of those portions, to be able to say hey, this is what’s going on behind the scenes.

If you’ve gone through like I have, and implemented networks, and played around with Tensorflow or TF Learn, you already know some of the things that are going on, which maybe you don’t understand it fully. This was a good review to start off to that perspective. If you haven’t taken the machine learning course, no problem. You can jump right into it. Like I said, he takes it from a high level here and gets you going.

Week three. My favorite week. We talked about shallow neural networks. This is the basics of how to build a neural network. What I like the most about this was, we deep dived into why non-linear functions and why we use different activation functions. It was really cool, because I actually taught a portion of this in my course, and just it was cool to see how Andrew was able to explain it. Maybe not a whole lot better than me. I don’t want to undersell myself, but it was definitely awesome to see his background, and his thought process, and just him saying, “Hey, this is why we use [Inaudible 00:05:19], these are some of the things that you’re going to see with it.” Don’t worry about it, because of these reasons. Really, my favorite portion of this course was week three, so around the shallow neural networks. Still went through and took a little bit longer to do the programming exercise than I thought would take me.

Little bit of stress there, but quizzes were good. It was easy if you follow along, and just take good notes, and you’ll be able to pass the quizzes. There’s a new thing that they’re trying out, too, called notes. I’ve started playing around with that. I’ll probably, in my next video, talk a little bit about that as I’m using it more and more, and maybe that’ll be a quick tip that you guys can use whenever you’re going through a course on Coursera.

Week four, not my favorite week. It was pretty good. We started getting into deep learning and deep neural networks and how those are working. Some of the things that we really did was talked about the matrix dimensions and how some of that works. Didn’t get into it as much as they will in future courses. It’s easy for me to look at it now and say that, because I jumped ahead a little bit. From the perspective of this course, neural networks and deep learning part one, really talks through some of the matrix portions and then starts building out your deep networks. Also, talks about parameters and hyperparameters. I was familiar with hyperparameters and parameters before, just with having been hands-on before, but it was really helpful to do those.

The quiz in this one, once again, if you paid attention, you went through it. You have to work through some math and do some other portions of it, but the quizzes are pretty simple. Make sure you’re using your own notes and everything for that. When it came to the programming exercises, I think there was two in this week, and they were somewhat difficult. I think the second one was pretty long as far as building out. You get to get hands-on with Tensorflow. Still a little bit more challenging, I guess, I think, and there’s some ways that we can make it a little bit better. Let me talk about that here just next.

Overall, I thought the course was all right. It was good for me, just some of it was a little bit of a review. Some of it went a lot deeper than I’ve dove in before, so I thought that portion was good for me. I will say, on all the programming exercises, they’re all graded. One of the things that I find challenging, and maybe it’s just the way that I learn, but I feel like they’re a little harder just because you go through, and it’s like you’re being tested day one. Whenever you’re going through the videos and everything, you’re doing everything from a math perspective on paper, or if you’re taking digital notes, but you’re not really doing any of the programming functions. If you don’t have a solid basics in programming, or it’s not something that you do every day from that perspective, I think it’s going to be a little more challenging. One of the things that could help out, I think, and broaden for the students that are coming in would be to have more coding examples that aren’t graded. It doesn’t have to be verbatim. Hey, this is really, really close to what the examples are. I get that you want to test, and you want to make it so that you’re applying what you’re learning.

Also, I think a few more coding examples where you can go through and see, “These are some of the steps.” If you understand the math portion of it, doesn’t necessarily mean that you’re going to be able to go in and be able to program it right there, and when we talk about it from a real-world perspective, whenever I look at it, yeah, you need to understand those things, and know how to implement those at a base level, but there’s so many. There’s so many other things it can do from a high level. For example, one of the biggest challenges I had going through this was, I build a whole course around TF Learn, and being able to use that abstraction layer over Tensorflow. For me, having to go through step by step, and showing how you can do this, where you can write it in TF Learn or use one of those functions, I think that would’ve been… That would be a different approach to take it, and I think that would broaden the audience, and make it a little more enjoyable, too.

If you’re having to go through, and you know that writing these 60 lines of code is something that you can write in 4, it makes it a little bit harder, especially since I already just did all the math portion, and kind of went through all those activations and everything work, versus having to go through some of the minutia on the programming. That’s just my two cents. If you’ve taken this course, please tell me. Tell me your opinion. You’re listening to mine. Let’s make this a conversation. I’d love to hear what some of your thoughts are, where you think I’m wrong if you think I should be better at math. You’re probably right. I think I’m getting the math. We’ll see.

Fair enough, my programming skills in Python, like I said, they’re all right. They’re not to the level here. I think that’s another gap that I found going through this course. All in all, I guess I would recommend it if you’re looking into using deep learning, but I don’t think that, if you’re a data engineer, that you have to go through anything like this. Like I said, it’s a good aspect of it, but there’s some other things and other skills that you probably want to get. If you’re more looking to the data science, or deep learning, or machine learning engineer, then going through something, one of these, this course would probably be pretty good. In the next video, check out, I jumped way too ahead in the next course. You might see. I jumped to, I think, the fifth portion or fourth portion when I was supposed to go to the second portion. I’ll talk about that in the next video. If you have any questions, make sure you put them in the comments section here below, or reach out to me on thomashenson.com/big-questions. Find me on Twitter or Instagram. Ask any questions. I’ll try my best to answer them. Make sure you subscribe so that you never miss an episode, and ring that bell. Thanks again.

Filed Under: Data Science Tagged With: Data Science, Deep Learning, Neural Networks

Do Data Scientist Code?

June 6, 2019 by Thomas Henson Leave a Comment

Scientist Who Code

Data Science is a hot career field in Data Analytics. On data teams with Data Engineers how much coding is expected from the Data Scientist?

The role of the Data Scientist includes find features or correlations in data that might predict outcomes. Those prediction then become data models that are tested multiple times. After those data models reach a high confidence level they are then automated in applications. In this episode of Big Data Big Questions let’s find out just how much coding a Data Scientist can be expected to do.

Transcript – Do Data Scientist Code?

Hi folks. Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today, I’m still here in the gym answering your questions, recording between mail trucks coming by. Crazy.

Today’s question comes in from a viewer. Thanks for watching. If you have any questions, make sure you put them in the comments section here below, and also subscribe, make sure you hit that bell so you get notifications. Today’s question comes in around data scientists, and it’s specifically, “Do data scientists code?” We talked about the roles of a data engineer. We’ve talked about the roles of a machine learning engineer and even data scientist, but where is that fine line between how much data scientists code? We’re going to talk about that, and talk about some of the tools that they use, and then try not to day, “Depends.”

I know, I know. You’re like, “Man. He’s going to say, ‘depends.'” But, I’m going to try not to say that. Does a data scientist code? The answer is yes. Data scientists, for the most part, they’re able to code. The tools that they use, how much are they coding, that’s really going to be dependent on — didn’t say depends, dependent — the role that they’re in. If they have a data engineer or a machine learning engineer, that can help them put their code in production and finalize some of the things that they’re doing.

I will say, I’ve worked on a team before where, from a data scientist perspective, they were primarily using MATLAB. They were using MATLAB or Excel, and then when we pushed it to Hadoop, we’ve talked. You’ve heard me talk about it many times on here, that little bit harder, because once we’re doing that in a solo environment on their machine, and then we went to distributed algorithms, at the time we were using Mahout [Phonetic 00:01:45]. Going from Excel or MATLAB to Mahout, we really had to change and tune a lot of different things. That’s where my expertise as a data engineer and developer was able to help out and keep running the same job a hundred times.

Things have changed with TensorFlow and a lot of other tools since that time. Yes, the data scientists will go. It’s going to depend on what they’re going to use. Some of the common tools, like I said, MATLAB, even Excel, dependent on what you’re working on. If it’s a big data project, might not be using some of the other larger tools. Then, you also have R. You have Python, which we’ve talked about and we love here. We also have Scala or skuh-la. I always say it wrong. There are many different tools that data scientists use whenever they’re coding. That doesn’t mean that they’re doing all the coding. This is back to the dividing lines of the roles and where they are. Whenever we’re talking about standing up the environment, pushing things out to production, and even doing some of the heavier lifting, like I will say operationalizing of the code. Getting it ready, past the trendy phase on some of the other pieces that you’re really bringing it in to, “Hey, it’s going to support this dashboard.” It’s going to do this piece, or even some of the ETL jobs, still going to come to us as the machine learning engineers or the data engineers, just depending on the role. Still not saying depends. It’s depending on that role. For the most part, to answer the question simply, yes. Your data scientist is going to code. How much of that code is really in-depth, if you think about, are they doing the job of coding? I would say no. That would just be what I’ve seen. If you think about some of the reasons that we have the tools that we have now, like why was PIG created? Why do we have a Python API in Spark, when Spark, we could do Spark in Java, right?

Why do we have a Scala API in Spark? It’s because of the fact that we want to use a higher-level language, so that data analysts, data scientists, they can run their code, and they can do it without having to worry about Java and some of the other components there. Yes, data scientists code. How much do they code? It’s going to depend on their partner machine learning or data engineer. Your data scientist is not going to replace the data engineer or machine learning engineer. We’re all on the same team, here. We’re not trying to compete back and forth, and if I had to choose a side, I’d say machine learning engineers and data engineers. But, I’m very biased.

That’s all for today. Hope you enjoyed this episode. If you have any questions, throw them in the comments section here below, and make sure you hit subscribe, and ring that bell, and I will see you again on the next episode of Big Data Big Questions.

Filed Under: Data Science Tagged With: Code, Data, Data Science

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...