Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Data Engineer vs. Data Scientist

December 27, 2017 by Thomas Henson 2 Comments

 

Data Engineer vs. Data Scientist

What’s the Difference Between the Data Engineer & Data Scientist

The Data Scientist has been one of the top trending careers choices for the past 3-4 years but where is the love of the Data Engineers? In reality I think more people are confused about the roles in Big Data. Both Data Scientist and Data Engineers are used interchangeably but the roles require different skills sets. In this video I will break down the differences between the Data Engineer vs. Data Scientist.

Transcript – Data Engineer vs. Data Scientist

Thomas: Hi, folks. I’m Thomas Henson with thomashenson.com, and today is another episode of Big Data, Big Questions. And so today, we’re going to be tackling the differences between a data scientist and a data engineer. Fine out more right after this.

Thomas: So, do you want to become a data engineer, or do you want to become a data scientist? So, this is a question…this is something we see a lot about is all about the data scientists, and big data, and then data engineering. But what’s the different between the two roles, and why is it that since 2012 data scientists have been the sexiest career in IT? And there’s been a lot of publication about it. There’s been a lot of information about it. We’re all talking about machine learning, and smart cars, and the battle between Tesla and Apple for machine learning engineers, data scientists, and how they can battle that out. But what about the data engineer, too? And kind of what are the differences? Well, I’ll tell you. Recently, Information Weekly had a survey out there for 2018 and the highest paying salaries in IT. Data engineer came in at number five. The only other roles that were above the data engineer were all C-level positions. So, think of your CIO, your CSO, your CTO. So, data engineers are getting a lot of love, too. But what are the differences between those two roles? So, we’ll break it down first jumping into what a data scientist does.

So, what does a data scientist do in their day to day work? Well, one of the things that they do is they evaluate data. So, that’s kind of a given. But how do they do that? So, they use different algorithms. They look at different results from data. So, say that we’re trying to find out a way for a web application to have more users that are engaged with it. So, how do I create more engaging content for my users and for my customers and give them something to value? Well, a data scientist would be able to take and look at different variables. So, maybe we get in a room, and maybe everyone kind of comes up with some variables and says, “Okay, how can we retain user retention? So, does this piece of content work? We’ve got some testing on these other pieces. Here’s some of our historical data.”

And so the data scientist, what they’ll do is they’ll evaluate all those data points and tell you which ones are going to be the most relevant. And they’ll do that by using algorithms. So, they’ll look, and maybe they’ll use SVD to find out, okay, which variables are going to make the most sense for us to have more engaging content, have a web application that makes users want to stay and engage with it longer. And so that’s kind of where their role is. Now, they’re not going to be the ones that are writing the MapReduce jobs or doing some of the Spark jobs. We really want them just evaluating the data and helping us build different data models that are going to give us the results we’re looking for.

So, if we can just increase our user retention time or increase our engagement of our content, our web application is going to be more popular. So, in our example, that’s what we want. We want our data scientists that are really evaluating…finding correlations between data and also eliminating correlations. So, this variable that we predicted that we thought was going to be very key to engaging for our web application for our users, it really doesn’t make a difference. And so it gives our developers, and our engineers, and our product and marketing team things for them to look at and say, “Hey, these are the variables that we need to focus on, and this is what’s going to make our web application…give us the desired results we’re looking for and increase that user retention time, increase the engagement for our users in our web application. So, that’s our data scientist.

Now, on the flip side, what is our data engineer going to do? So, our data engineer, they’re the ones that are going to say, “We’ve got this data here. We’re moving the data maybe into our Hadoop cluster. So, we’re moving it into our Hadoop cluster or wherever we’re storing it for analytics.” And so they’re the ones that are really moving that data there. They’re also writing those MapReduce jobs or Spark jobs. They’re doing the development portion of big data. So, our data scientists are over here saying this is the data that I need. The data engineer is saying, “Oh, we have the data. What kind of format…? How should we clean the data? How fast do you need the data, too?” So, how much speed is a concern for some of these variables and being able to fetter out some of the details, and being able to give…maybe improve that product a little bit faster to get it to the users.

And so that’s where you’re going to see the data engineer. They’re also going to be the ones that are managing and configuring our Hive and HBase deployments and doing some of the technical debt work that we’ve talked about before with making sure that we have a strategy for backup, making sure we have a strategy for high availability. So, this product that we’ve got here for our web application, we want to make sure that we’re still feeding our data in, and our data models are feeding our data back to our data scientists. But then we’re also pushing out those results from what the data scientists have given us, too. So, you kind of see two distinct roles.

So, our data engineer, they’re going to be involved in the tech. They’re going to be the ones that are really building those systems out. Where our data scientists, they’re involved with the data. They’re involved with the technology as far as how to use the…what tools are going to help them be able to [INAUDIBLE 00:05:20], use different algorithms, and be able to say, “This data point really makes a different where this other data point may not be making as much of a difference, and so it’s going to…” They’re going to be using those tools for that. But basically what they’re doing is they’re involved in the data. And you see the data engineers involved in the technology, and implementing, and kind of using that strategy.

So, I’m not saying one is better than the other one, but I may be a little bit biased because I’m a data engineer and like data engineering. But two different skillsets, two very important skillsets, two amazingly great career choices right now in IT, two of the probably highest paying individual contributor roles in IT right now. So, you can’t go wrong either way. If you’re looking for more tips and more information about being a data engineer, make sure you subscribe to this channel and find out more information about data engineers. We explore how to do different things. If you have any questions like this, make sure you submit them. Big Data, big questions, using the #bigdatabigquestions on Twitter, go to my website, thomashenson.com, Big Questions, submit your questions there. Put it in the comments below here. I’ll answer it on YouTube the best I can. Any questions like that, just get in touch with me. I’ll answer them on here. Make sure you subscribe. Thanks again for tuning in, and I’ll see you next time.

Show Notes

Singular Value Decomposition

Big Data Beard Podcast Episode 13: A LESSON IN DATA MONETIZATION FROM THE DEAN OF BIG DATA

Salary for Highest Paying Tech Jobs

Filed Under: Data Engineers Tagged With: Big Data Big Questions, Data Engineer

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2025 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in