Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Archives for August 2016

Hadoop from the Command Line

August 1, 2016 by Thomas Henson Leave a Comment

hadoop from the command line
Moving data around in Hadoop is easy when using the Hue interface but when it comes to learning to do this from the command line it gets hard.

In my Pluralsight course HDFS Getting Started you can get up to speed on moving data around from the Hadoop command line in under 3 hours. The first two modules of this course will get you up to speed on using Hadoop from the command line using the hdfs dfs commands.

Not Just Hadoop from the Command Line

Don’t just stop at learning Hadoop from the command line, let’s focus on the other Hadoop frameworks as well. The last few modules of this course will focus on using the following Hadoop frameworks from the command line.

Hive & Pig

Two of my favorite Hadoop Framework works tools. Both of these tools allow for administrators  to write MapReduce jobs without having to write in Java. After learning about Hadoop Pig and Hive are two tools EVERY Hadoop Admin should know. Let’s break down each one.

Hive fills the space for the structured data in Hadoop and acts similar to a database. Hive uses a syntax called HiveQL that is very similar to SQL syntax. The closeness of Hive and SQL database was intentional because most analyst know SQL not Java.

Pig’s motto is it eats anything, which means it process unstructured, semi structured, and structured data. Pig doesn’t care how the data is structured it can process it. Pig uses a syntax called Pig Latin, insert pig latin joke here,
which is less SQL like than HiveQL. Pig latin is also a procedural or step by step programming language.

HBase

Learning to interact with HBase from the command line is hot skill for Hadoop Admins. HBase is used when you need real time read and writing in Hadoop. Think very large data sets like billions of rows and columns.

Configuring and setting up HBase is complicated but in HDFS from the Command Line you will learn to setup a development environment to start using HBase. Configurations changes in HBase are all done from the command line.

Sqoop

Hadoop is about unstructured data but what about data that lives in Relational Databases? Sqoop allows Hadoop administrators to import and export data from traditional database systems. Offloading structured data in data warehouses is one of Hadoop’s biggest use case. Hadoop allows for DBAs to offload frozen data into Hadoop for 10x the cost. The frozen or unused data can then be analyzed in Hadoop and bring about new insights. In my HDFS Getting Started course I will walk through using Sqoop to import and export data in HDFS.

Filed Under: Hadoop Tagged With: Hadoop, Learn Hadoop, Pluralsight

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...