Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Archives for April 2018

Skills Needed for Big Data Administrators

April 30, 2018 by Thomas Henson 1 Comment

Big Data Administrators

Data Engineers & Big Data Administrators

In today’s episode of Big Data Big Questions we tackle what the skills are needed for Big Data Administrators. Data Engineers wear many hats in Data Analytic workflows, one part software engineer and one part systems administrators.  The Big Data Administrators are responsible for keeping Hadoop, Kafka, Ambari, and other frameworks running. Find out what other skills Big Data Administrators need in the video below.

Make sure to subscribe to my YouTube channel to never miss an episode of Big Data Big Questions.

Transcript – Skills Needed for Big Data Administrators

Hi, folks! Thomas Henson here, with thomashenson.com, and today is another episode of Big Data Big Questions. Today, I’m going to answer a user question about data administration, or in big data, what is that big data administrator’s role?

What are some of the tools that they use? How can you get involved? Find out more, right after this.

Welcome back. Today’s question is going to revolve all around the big data administrator, what that role is, what are some of the tools that they use? This question came in from my website. You can do Big Data Big Questions, go to thomashenson.com, click on Big Questions, submit a question there. Put them in the comments section here below, and then always, make sure that you’re subscribing to this YouTube channel, so that you’ll never miss an episode. These are great tips. These are great ways for me to answer any questions that you have. If you have those questions, ask them, but also make sure you’re subscribing to the channel.

Today’s question comes in from Jarvis. He says he has a dilemma on Python for big data. We answered a number of questions around Python and big data, and then do you have to know Java? But, this one is a little bit different. It’s going to cover the data administrator.

Hi Thomas, a big fan of yours.

Thanks for watching. Thanks for sending in the question.

I had a question related to IT careers and skills in big data. I wanted to know if Python is required only by data administrators, or can all things done by Java on big data be implemented using Python as well?

This question is really good. Like I said, we’ve talked a little bit about, do you have to know Java in order to be able to be a big data admin, be involved in big data, be a data engineer?

The answer is no. You can do things in Python, but I want to tackle the question from the perspective of, you’re asking about data administration, and so there are two different roles. We’ve talked about the data engineer versus the data scientists. The data engineer is the one who’s setting up the cluster, maybe doing some of the software development, running your Hive jobs, maybe even just the software developer, from if you’re writing Java jobs, if you’re writing your Spark jobs, but your data administrator, that’s a different role inside of that. We have two pieces of the spectrum. This side over here, this is more software development side generated, and on this side over here, let’s say that this is more of the administrator, or our systems engineer, the person who’s setting up and running the cluster. Maybe not doing the day-to-day coding but doing the administrating and running of the system. Think of that as your full stack developer.

Think about when you split up your systems admin, who’s setting up the stack, making sure the database is running, doing those tasks versus who’s running the… Whether it be PHP code or .NET code. What skills does a data administrator have to have?

I would say that, if we’re talking about being able to be involved in the community, and be involved in big data, you’re going to keying on HTFS, Ambari, Hive, Flume, and you’re going to have a lot of Linux skills. If you’re asking me, you want to get into data administration, you want to be an awesome data administrator in the big data ecosystem, do you have to know Java? No. Can everything be implemented in Python? Maybe, but you’re probably going to be doing more administrative tasks as far as setting up the cluster, understanding the operating system that Hadoop’s running on.

You’re maintaining more that Linux level, and the Hadoop ecosystem level, so if you’re using Hortonworks or you’re using Cloudera, how all those tools are integrating and talking to each other. I would focus more on not even so much the coding part, but as far as being able to set up that cluster. It’s going to vary, too. It’s going to vary in the role.

Some places, especially when you’re just starting out on big data, and you have a small team in your company, you’re going to be the software engineer and the data administrator, right? You might need to have a little more code.

If you’re going to a more seasoned team or a bigger team, you can actually have that role where you’re running the administration. My answer is, I wouldn’t worry so much about Python and Java, if that’s the role that you’re wanting.

The data administrator, I would worry about being able to integrate the tools. Be familiar with the tools, be familiar with how to set up, how to add notes, how to take notes down. How to set up secondary name nodes, so, being able to make sure that, when one name node goes down, the second, you can flip over to the second name node. Being able to back up the data. Making sure that we’re taking snapshots. All the kind of tasks that go into running the system, versus being able to write a MapReduce job. If you’re really keen on being a big data administrator, which, those are great roles, those are a lot of fun, you’re still hands on, but you’re not really having to write the jobs.

You’re checking out new tech, checking out new projects, to see, “Hey, am I going to be able to integrate this into our system,” or, “Man, you know, we’ve got two or three more nodes that are going to come online, so let’s make sure that we get those racked and stacked, and then, let’s make sure that we’re adding those to the cluster, too.”

A lot of cool things that you can do in that role. Most of them aren’t going to involve coding, so you’re not really going to have to worry about Java, you’re not going to have to worry about Python, as much as you would in the traditional data engineer, where you’re looking at being more of a software engineer.

I hope I answered your question. If anybody else has any questions, put them in the comments section here below. Make sure to follow me here, so click subscribe, and then I’ll see you next time.

Filed Under: Data Engineers Tagged With: Big Data, Data Engineer

Rise of the Machine Learning Engineer

April 27, 2018 by Thomas Henson Leave a Comment

 

machine learning engineer

What is a Machine Learning Engineer?

Move over Data Scientist the Machine Learning Engineer is now the best role in Big Data Analytics.  The Machine Learning Engineer is a hybrid mix of half Data Engineer and half Data Scientist, who can implement the data models and even make recommendation for new data sets. Find out why the Machine Learning Engineer is getting a lot of attention in 2018 by watching the video below.

Make sure to subscribe to my YouTube channel to never miss an episode of Big Data Big Questions.

 

Transcript – Rise of the Machine Learning Engineer

Hi, folks! Thomas Henson here, with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question comes in from a user, and this all are going to be about the machine learning engineer. What is a machine learning engineer? How does it differ from a data engineer or data scientist? We’re going to jump into all that right after this.

Welcome back. Today’s question comes in from a user, so before we jump into the question, make sure that you go and click on the subscribe, so that you never miss an episode. Also, if you have a question and you would like for me to answer it, about data engineering, about books, about business, anything around IT and specifically probably data analytics, make sure you put those in the comments section here below. Go to my website, thomashenson.com/bigquestions or use the hashtag #BigDataBigQuestions on Twitter. I will try my best to answer those as quickly as I can.

I’ve been getting a lot of questions in, and I’m really thankful for all the questions, and I am working through them as well. Today’s question comes in from a user. From the comments section on YouTube, Andrew Wiley [Phonetic]. He says, “Is it possible to learn both data science and data engineering?” This question stems off of the Cloudera certification. I’ve answered some questions around what is a data engineer, what is a data scientist, but this question is specifically, “Okay, is there a blended of two?” Is there one position that’s a blend of two?

I’ll say, for a while, there’s been a lot of confusion around, “Okay, if you’re a data scientist, you know how to stand up a Hadoop cluster, or if you know how to stand up a Hadoop cluster, you must be a data scientist. You’re a wizard, right?” This question is about, what about the blending of the two skills? Think about it from a web development perspective. For a long time, we had our web developers, and we had our back-end developers, and then we had the full-stack web developer. Now, we have a full-stack data engineer, and those are called machine learning engineers.

On a recent podcast out there, that O’Reilly did at Strata, they had a couple quests on talking about the rise of the machine learning engineer, and so I would say that if you’re looking to have skills with data science and data engineering, that position is going to be called a machine learning engineer. My view on how the machine learning engineer has come to fruition is in two parts. If you’re working in a small development or small analytics shop, most likely the data engineer, the person who’s putting together the code and running the system, there’s going to be one or two people on that. It’s going to be a really small team, who are going to be filling that role of a data scientist.

There’s a lot. There’s a big skills gap for data engineers and even more so with data scientists, too. You might be able to go through and look at some of the prescribed analytics and machine learning algorithms that you want to use, and you, as the data engineer, will understand how to use those. It’s not just willy-nilly, like, “Hey, I’m just going to pull this one down and have it.” You need to have a background in statistics, and probability, and heavy on math. One of the things, one of my gaps in skills that I’ve been working on is the math part.

You can follow along as, watch me learn how machine learning… The machine learning course, with Andrew Ng’s course, and you can see some of the things, especially if you’re a data engineer, that you need to shore up, so that you can fit into that machine learning engineer.

Think of the machine learning engineer in the small shop as, you’re the full-stack developer, you’re the full-stack engineer. It’s kind of doing everything. Then, in larger corporations, what you’re going to have is, like I said, we’ve got it on both sides of the spectrum. You’ve got your data engineer, that are really good at setting up, administrating an environment, maybe even doing the software development, running Hive, creating the MapReduce jobs or the Spark jobs, but then you have your data scientists who are, maybe have some SQL skills, really good at math, but not really good at the technical. The machine learning engineer is that person in the middle, to kind of bridge the gap. In bigger shops, you’re going to have your machine learning engineer who’s working with your data scientist, and then starts to be able to pick up on, “Okay, this is the way that we like to do some of the things here, and you’re really owning that part of the stack, and so, you’re not so much worried about developing and doing what I would call the Hadoop administration, or even the Hadoop development.

When I say Hadoop, remember, we’re just talking about anything in that ecosystem. Your machine learning engineer is your specialization of that. I did a little research, too, just to look at it. Just pulling it up, just some preliminary research, just looking for jobs out there. A lot of times, we’ll say, “Yeah, this is, you’re an Excel guru, and you say, ‘Excel guru?'” You go look, and there’s nobody with a job title excel guru. You’re giving it to yourself.

Looking at machine learning engineer, quick search on Google for jobs, there are a lot of different postings from companies all the way from IBM to Facebook, Lyft, a lot of different postings out there, just in my quick search. Also, looking at Glassdoor, and some of the other places, the salary ranges are right there with what a data engineer is, so anywhere from the low 80s, which I wouldn’t think that, that’s probably not really a true machine learning engineer, or maybe it’s in a different part of the country, all the way up to the 160s. That’s salary range per year. I thought that was pretty good mix, there.

Really fit in line with what we see as the data engineer and the data scientist, so those roles are out there. If you’re excited to go out and learn those, remember what I was saying. Want to have a solid background as a data engineer with understanding how the Hadoop administration works. Also, the workflows, and some of the development skills. Want to be able to implement, if you’re using Mahout, if you’re using TensorFlow, any of those frameworks, you want to be able to implement those, but then you also want to have the math portion too, so make sure you understand the algorithms from a math level, and how to tweak, and how to tune those.

That’s all for today. Hope I answered your question. If you have any questions, anybody out there, make sure that you first go and subscribe, and then ask your question. I’ll try to answer them here. Have a good day.

 

Filed Under: Data Engineers Tagged With: Data Engineer, Deep Learning, Machine Learning

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...