Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Big Data Big Questions: Learning to Become a Data Engineer?

September 22, 2017 by Thomas Henson 2 Comments

Learning to Become a Data EngineerData Scientist for the past few years has been named the sexiest job in IT. However the Data Engineer is a huge part of the Big Data movement. The Data Engineer is one the top paying jobs in IT. On average the Data Engineer can make anywhere from 90K – 150K a year.

Data Engineers are responsible for moving large amounts of data, administering the Hadoop/Streaming/Analytics Cluster, and writing MapReduce/Spark/Flink/Scala/etc. jobs.

With all this excitement for Data Analytics and Data Engineers, how can you get involved in this community?

Ready to learn tips to becoming a Data Engineer? Checkout this video for tips to becoming a Data Engineer.

 

Transcript

Hi Folks, I’m Thomas Henson, with thomashenson.com, and welcome back to another episode of Big Data, Big Questions. Today’s question is: What are some tips for learning to become a better data engineer? Find out more right after this.

So, today’s episode is all about tips for learning to become a better data engineer. So, if you’re watching this, you’re probably concerned with, one, how can I start out becoming a data engineer? What are some ways that I can learn to become better? Or maybe you’re just looking to answer one specific question. But all those are encompassed in what we call the data engineer.

A data engineer is somebody who’s concerned with moving data in and out of Hadoop ecosystem, being able to give status scientists and data analysts better views into the data. So, we’re involved with the day-to-day interactions of how that data is coming in. Is it in how we’re ingesting that data? How are we creating those applications and tuning those applications so that the data comes in faster? All to support those business analysts, those business decisions, and data scientists in creating better models and having just more data to put their hands on.

And so, a lot of times what we’re always doing is we’re asked to take on a couple terabytes of data here, maybe implement and do all the configuration for your hives. You know, your hive implementation or HBase or anything that’s in that big data ecosystem. Some of the tips that I’ve found for just getting started, so if you’re brand new to this and you don’t know where to start, the first thing I would recommend is, go out and just download the sandboxes.

So, download Cloudera’s sandbox, or download Hortonworks’ sandbox and just start playing with it. Go through some of the tutorials. Stand up on your local machine in a VM environment, and just start playing with moving some of the data around. Find some sample data, so go to data.world. Also, I have a post and a video on where to find some data sets, so take those data sets in, start ingesting those. I have a ton of resources and a ton of material on just some simple examples that you can walk through with Pig, and some around Hive. So, go there and find some of those. But, basically what I’m saying is, just get hands-on. Start creating applications. Start trying to do some simple things like, ingest some data in, put it into Hive, and be able to create a table and pull some of that data out, and just maybe some simple Hive queries. And do the same thing with Pig, and just kind of go around to some of those applications that you’re curious about, and start playing with them.

Another thing is, is once you start playing, and sampling, and testing that data, get involved. By getting involved, just ask some questions, create a blog post, try to find a way that you can contribute back to the community. I mean, that’s what I did when I was first starting out. I started off with a sandbox, and what I did was, I took and made sure that every day for 30 minutes, I was learning something new in the Hadoop ecosystem. And so, that’s another tip for you too, is to take and try to do this 30 minutes a day, every day. Even Saturdays, Sundays. Don’t take a day off. And it’s only 30 minutes. And if it’s something that you’re passionate about, and you like doing, that time is just going to fly by. But over time, that’s just really going to give you more and more time in the Hadoop ecosystem. So, whether you’re doing this for a project at work, whether you’re already in the ecosystem and you’re just trying to improve, that 30 minutes a day is really going to help. And it’s something that I’ve continued to do, and continued to do, now, even though I’ve been in part of the community for three or four years now. It’s how I just continue to learn, so I make sure I’m always kind of pushing.

Filed Under: Career Tagged With: Big Data, Big Data Big Questions, Data Engineer

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2025 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in