Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Installing And Configuring Splunk Course

April 29, 2019 by Thomas Henson Leave a Comment

Last month I released my 7th course at Pluralsight and first learning path course. Installing and Configuring Splunk is my second course in the world of Splunk. In this course I focus on understanding what it take to become a Splunk Architect. Since this is a learning path I get an opportunity to dive deep into Splunk.

Splunk is one of the hottest solutions in data analytics and is a great tool for Data Engineers. If you are looking to analyze log files or any type of machine generated data then Splunk offers the ability to quickly index and search data. This course is specially built for System Administrators, Data Engineers, or Data Enthusiasts to learn Splunk for the ground up.

installing and configuring splunk

What is Splunk

In the first part of the Installing and Configuring Splunk course we dig into the basics of Splunk. Not only do we cover what Machine Data but we also look at the history of Splunk. Finally we end the What is Splunk module by explaining the basics of the Splunk architecture. Next it’s time to walk through building out our development environment.

Installing Splunk

The second module we begin to work on building our Splunk development environments. First we setup a Splunk account so that we have access to the Splunk Enterprise downloads. Now we start installing Splunk in a Windows environment using the Splunk install wizard for Windows. Next we jump into the MacOS environment for installation and configuring on a Mac. Finally we end with installing Splunk on Linux from the command line.

Navigating Splunk

The last module in the Installing and Configuring Splunk course is all about navigating the Splunk development environments. We start by loading data from our local Windows environment to search through our machine log files. Next we explore Splunkbase for adding applications into our Splunk environments. Finally we end with a look ahead to the future courses in the Splunk Learning Path at Pluralsight.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

 

Filed Under: Splunk Tagged With: Analytics, Pluralsight, Splunk

Learning Tensorflow with TFLearn

February 11, 2019 by Thomas Henson Leave a Comment

Recently we have been talking a lot about Deep Learning and Tensorflow. In the last post I walked through how to build neural networks with Tensorflow . Now I want to shift gears to talk about my newest venture into Tensorflow with TFLearn. The lines between deep learning and Hadoop are blurring and data engineers need to understand the basics of deep learning. TFLearn offers an easy way to learn Tensorflow.

What is TFLearn?

TFLearn is an abstraction framework for Tensorflow. An abstraction framework is basically a higher level language for implementing lower level programming. A simple way to think of abstraction layers is it reduces code complexity. In the past we used Pig Latin to abstract away Java code for Tensorflow we will use TFLearn.

TFLearn offers a quick way for Data Engineers or Data Scientist to start building Tensorflow neural networks without having to go deep into Tensorflow. Neural Networks with TFLearn are still written in Python, but the code is drastically reduced from Python Tensorflow. Using TFLearn provides Data Engineers new to Tensorflow an easy way start learning and building their Deep Neural Networks (DNN).

Pluralsight Author

Since 2015 I’ve been creating Data Engineering courses through Pluralsight. My latest course on TFLearn titled Implementing Multi-layer Neural Networks with TFLearn is my sixth course on Pluralsight. While I’ve developed courses in the past this course was in two major areas: Implementing Multi-layer Neural Networks is my first course in the deep learning area. Second this course is solely based on coding in Python. Until now I had never done a coding course per say.

Implementing Multi-layer Neural Networks with TFLearn

Implementing Multi-layer Neural Networks with TFLearn is broken into 7 modules. I wanted to follow closely with the TLearn documentation for how the functions and layers are broken down. Here are the 7 modules I cover in Implementing Multi-layer Neural Networks with TFLearn:

  1. TFLearn Course Overview – Breakdown of what is covered in this course around deep learning, Tensorflow, and TFLearn.
  2. Why Deep Learning – Why do Data Engineers need to learn about deep learning? Deep dive into the basic terminology in deep learning and comparison of machine learning and deep learning.
  3. What is TFLearn? – First start off by defining TFLearn and abstraction layers in deep learning. Second we breakdown the differences between Tensorflow and TFLearn. Next we run through both the TFLearn and Tensorflow documentation. Finally we close out the module by building your TFlearn development environment on you machine or in the cloud.
  4. Implementing Layers in TFLearn – In deep learning layers are where the magic happens so this where we begin our Python TFLearn coding. In the first example we build out neural networks using the TFLearn core layers. Our second neural network we build will be a Covolutional Neural Network (CNN) with out MNIST data source. After running our CNN it’s time to build our 3 neural network with a Recurrent Neural Network (RNN). Finally we close out the module by looking at the Estimators layers in TFLearn.
  5. Building Activations in TFLearn  – The activations module give us time to examine what mathematical functions are being implemented at each layer. During this module we explore the different activiations available in Tensorflow and TFLearn.
  6. Managing Data with TFLearn – Deep learning is all about data sets and how we train our neural networks with those data sets. The Managing Data with TFLearn module is all about the tools available to handle our data sets. In the last topic area of the data module we cover the implications and tools for real-time processing with Tensorflow’s TFLearn.
  7. Running Models with TFLearn – The last module in the Implementing Multi-layer Neural Networks with TFLearn Pluralsight course in all about how to run models. During the course we have focused mainly on how to implement Deep Neural Networks (DNN) but in this module we introduce Generative Neural Networks (GNN). Finally after comparing DNNs and GNNs we look to the future of deep learning.

Honest Feedback Time

I would love some honest feedback on this course:

  • How did you like?
  • Would you like to see more deep learning courses?
  • What could be better?

Feel free to put these answers in the comment section below or send me an email.

Filed Under: Tensorflow Tagged With: Deep Learning, Pluralsight, Python, Tensorflow, TFlearn

Hadoop from the Command Line

August 1, 2016 by Thomas Henson Leave a Comment

hadoop from the command line
Moving data around in Hadoop is easy when using the Hue interface but when it comes to learning to do this from the command line it gets hard.

In my Pluralsight course HDFS Getting Started you can get up to speed on moving data around from the Hadoop command line in under 3 hours. The first two modules of this course will get you up to speed on using Hadoop from the command line using the hdfs dfs commands.

Not Just Hadoop from the Command Line

Don’t just stop at learning Hadoop from the command line, let’s focus on the other Hadoop frameworks as well. The last few modules of this course will focus on using the following Hadoop frameworks from the command line.

Hive & Pig

Two of my favorite Hadoop Framework works tools. Both of these tools allow for administrators  to write MapReduce jobs without having to write in Java. After learning about Hadoop Pig and Hive are two tools EVERY Hadoop Admin should know. Let’s break down each one.

Hive fills the space for the structured data in Hadoop and acts similar to a database. Hive uses a syntax called HiveQL that is very similar to SQL syntax. The closeness of Hive and SQL database was intentional because most analyst know SQL not Java.

Pig’s motto is it eats anything, which means it process unstructured, semi structured, and structured data. Pig doesn’t care how the data is structured it can process it. Pig uses a syntax called Pig Latin, insert pig latin joke here,
which is less SQL like than HiveQL. Pig latin is also a procedural or step by step programming language.

HBase

Learning to interact with HBase from the command line is hot skill for Hadoop Admins. HBase is used when you need real time read and writing in Hadoop. Think very large data sets like billions of rows and columns.

Configuring and setting up HBase is complicated but in HDFS from the Command Line you will learn to setup a development environment to start using HBase. Configurations changes in HBase are all done from the command line.

Sqoop

Hadoop is about unstructured data but what about data that lives in Relational Databases? Sqoop allows Hadoop administrators to import and export data from traditional database systems. Offloading structured data in data warehouses is one of Hadoop’s biggest use case. Hadoop allows for DBAs to offload frozen data into Hadoop for 10x the cost. The frozen or unused data can then be analyzed in Hadoop and bring about new insights. In my HDFS Getting Started course I will walk through using Sqoop to import and export data in HDFS.

Filed Under: Hadoop Tagged With: Hadoop, Learn Hadoop, Pluralsight

HDFS Getting Started Course

February 22, 2016 by Thomas Henson 4 Comments

Are you ready to get some Hadoop knowledge dropped on you?

Well here it is after eight long months since my last Pluralsight course.

HDFS Getting Started has been launched. I couldn’t be more excited to have this course released.

HDFS Getting Started

HDFS Getting Started is baseline course for anyone working with Hadoop. Starting development with Hadoop is easy when testing in your local sandbox but what happens when it’s time to go from testing to production?

Hadoop management and orchestration is hard. Most task are accomplished from the command line. Even something as simple as moving data from your local machine into HDFS can seem complicated.

What’s HDFS Getting Started about?

My new Pluralsight course, HDFS Getting Started, walks through real life examples of moving data around in the Hadoop Distributed File System (HDFS). Learning to use the hdfs dfs commands will ensure you have the baseline Hadoop skills needed to excel in the Hadoop ecosystem.

Structured data is all around us in the form of relational databases. In this course we will ingest data from MySQL database into HDFS using Sqoop. Walk through a quick tutorial of writing a Sqoop script to move structured stock market data in MySQL into HDFS.

Pig and Hive are great ways to structure data in HDFS for analysis, but moving that data around in HDFS can get tricky. In this course we walk through using both applications to analyze stock market data. All from the Hive and Pig command lines.

Hbase is another hot application in the Hadoop ecosystem. Do you know how to move data from HDFS into HBase? In HDFS Getting Started learn to take our stock market data index it and move it into HBase by writting a Pig script.

How is the Course Broken down?

HDFS Getting started is broken down into six modules. The modules cover different applications and how they use HDFS to query/ingest/manipulate/move data in Hadoop.

HDFS Getting Started Modules

  1. Understanding HDFS
  2. Creating, Manipulating and Retrieving HDFS Files
  3. Transferring Relational Data to HDFS using Sqoop
  4. Querying Data with Hive and Pig
  5. Processing Sparse Data with HBase
  6. Automating Basic HDFS Operations

Let me know if you have any questions about the course or a suggestion for a new course.

Filed Under: Hadoop Tagged With: Big Data, Hadoop, HDFS, Pluralsight

Pig Latin Getting Started Course

May 4, 2015 by Thomas Henson 4 Comments

Guess whose first Pluralsight course was released today?

This guy’s:

Pig Latin Getting Started

It’s been an incredible journey shooting my first Pluralsight course. I’ve certainly learned a great deal throughout the process.

For the last month I’ve been recording away and trying to get my first course ready (“Pig Latin: Getting Started“), learning all the tips and tricks of Camtasia, setting up my own microphone, etc. I’ve had no background in setting up quality sound equipment, but I have an awesome Pluralsight editor who helped me through the process.

Finally, all the hard work has paid off and my course is live.

My first course is “Pig Latin: Getting Started.” This course is a beginner course on Pig Latin and tries to help users who are familiar with SQL translate those skills into Apache Hadoop’s Pig application. Typically anywhere you have a Hadoop Distributed File System (HDFS) installed, you will have Pig running as well.

Why Big Data?

Data is consuming the world as we speak, and Big Data developers are in high demand. The median salary for Big Data developers is around $103,000/year. Which makes this a great time to begin a career in Big Data or transition to a Big Data position.

One problem is knowing where to start. When I first started out, I didn’t have a direction or a roadmap for what to learn about the Hadoop Stack.

Do I learn Hive or Pig?

What is Oozie and Zookeeper?

What is this an animal farm?

This is the reason I decided to become a Pluralsight author. I wanted to help others who are just starting out in the Hadoop stack, whether it be job-related, in my case, or simply because you’re ready for a new challenge. Pluralsight gives me that opportunity to reach a huge audience. Together we can make a roadmap to help you navigate through the Hadoop stack.

Why Start with Pig Latin

Why did I start with Pig Latin? There are other applications I could have started with, but Pig Latin is a great first step. I feel Pig offers the right mix of ease-of-use and ability to create more powerful queries to give a better picture of what exactly MapReduce is.  In fact, for my first MapReduce job, I used Pig Latin not Java. Before developing with Pig I did have some experience in Java, but it’s not necessary. To learn Pig Latin, all you really need is a basic understanding of SQL, and you can begin to write powerful MapReduce jobs in 10 minutes. Check out this example Pig Script.

If you ready for the challenge to conquer the Hadoop stack, then let’s get started with Pig Latin. Pig Latin: Getting Started will take you through the steps

  1. Setting up a Hadoop development environment
  2. Comparing Java MapReduce to Pig Latin
  3. Loading and Storing Data
  4. Examples on where to find Data
  5. Using Pig from the Grunt Shell
  6. Writing User Defined Function

 

Checkout the course to find out more about my journey into MapReduce and the Pig Grunt shell.

 Pig Latin: Getting Started

Filed Under: Hadoop Pig Tagged With: Apache Pig, Big Data, Course, Pluralsight

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...