Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Pig String Functions

March 21, 2016 by Thomas Henson Leave a Comment

Stuck trying to manipulate a string in Hadoop and don’t want to use Java?

No Problem use Pig’s built in String Functions.

Pig String Functions

Why Pig for ETL?

Using Apache Pig in Hadoop is a must for ETL transactions. Pig allows for developer to quickly write a Pig Script to transform data in Hadoop. In Pig the String Functions are shipped with Pig and learning them is a time saver for ETL. So whether you are trying to covert case in a string or use a regular expression to extract data the Pig String Functions has you covered.

What’s Covered?

In this series I will walk through using the String Functions in a quick 5 minutes tutorial broken down by each function. Each video will build off the previous function but it’s not essential to wathc in order. I wanted each video be able to stand alone for quick reference for each String Function.

All the source code and files can be found on my Pig Example Github page. So you can follow along through the tutorial or grab the code after watching. Feel free to use and abuse the code. As a developer sometimes it’s easier to have something to start with rather than a blank screen.

If you already have your Hadoop development environment then you are ready to start. 

If you are just starting out with Hadoop and Pig you might want to start here to learn about Pig. I’ve written a lot of post and published a couple videos on getting started with Pig Latin. So you’ll want to be familiar with those as you step through this series.

Pig String Functions

  • Pig String Functions #1 – The LOWER function in Pig converts a string or strings to lowercase.
  • Pig String Functions #2 – The UPPER functions in Pig coverts string or strings to upper case. Upper Function

Hope you enjoyed this series. Let me know what you liked and anything you would like to see in the future. As always if you need help just ask.

Bonus Content: For more Pig Functions check out the Pig Eval Function Series. 

Filed Under: Hadoop Pig Tagged With: Apache Pig, Apache Pig Latin, Hadoop Pig, Learn Hadoop, Pig String Series

HDFS Getting Started Course

February 22, 2016 by Thomas Henson 4 Comments

Are you ready to get some Hadoop knowledge dropped on you?

Well here it is after eight long months since my last Pluralsight course.

HDFS Getting Started has been launched. I couldn’t be more excited to have this course released.

HDFS Getting Started

HDFS Getting Started is baseline course for anyone working with Hadoop. Starting development with Hadoop is easy when testing in your local sandbox but what happens when it’s time to go from testing to production?

Hadoop management and orchestration is hard. Most task are accomplished from the command line. Even something as simple as moving data from your local machine into HDFS can seem complicated.

What’s HDFS Getting Started about?

My new Pluralsight course, HDFS Getting Started, walks through real life examples of moving data around in the Hadoop Distributed File System (HDFS). Learning to use the hdfs dfs commands will ensure you have the baseline Hadoop skills needed to excel in the Hadoop ecosystem.

Structured data is all around us in the form of relational databases. In this course we will ingest data from MySQL database into HDFS using Sqoop. Walk through a quick tutorial of writing a Sqoop script to move structured stock market data in MySQL into HDFS.

Pig and Hive are great ways to structure data in HDFS for analysis, but moving that data around in HDFS can get tricky. In this course we walk through using both applications to analyze stock market data. All from the Hive and Pig command lines.

Hbase is another hot application in the Hadoop ecosystem. Do you know how to move data from HDFS into HBase? In HDFS Getting Started learn to take our stock market data index it and move it into HBase by writting a Pig script.

How is the Course Broken down?

HDFS Getting started is broken down into six modules. The modules cover different applications and how they use HDFS to query/ingest/manipulate/move data in Hadoop.

HDFS Getting Started Modules

  1. Understanding HDFS
  2. Creating, Manipulating and Retrieving HDFS Files
  3. Transferring Relational Data to HDFS using Sqoop
  4. Querying Data with Hive and Pig
  5. Processing Sparse Data with HBase
  6. Automating Basic HDFS Operations

Let me know if you have any questions about the course or a suggestion for a new course.

Filed Under: Hadoop Tagged With: Big Data, Hadoop, HDFS, Pluralsight

Top 10 Favorite Post of 2015

January 4, 2016 by Thomas Henson Leave a Comment

So long 2015

2015 is done. I love the New Year because it’s always a good time to look back at what you have accomplished. 2015 presented me with new challenges and opportunities. As I was planning out my goals for next year, I wanted to look back at what I’ve done. Here are a few of the things I accomplished in 2015:

Blog Posts – 26 (New Record!!)

Books Read – 21

States Visited – 7 (WA, IL, TN, MS, LA, FL, CA)

YouTube Videos – 2 (Learn Pig Latin in Under 2 Hours & T-shirt Method)

Pluralsight Course – 1 (Pig Latin Getting Started)

top-10-blog-post-2015

This year was very busy and I’ve published more content in 2015 then any other year. Since this year has been a record year, I thought I should put together a list of my favorite 10 post. Hopefully these post are your favorites as well…..

Top 10 Favorite Post from 2015

  1. Pig Latin Getting Started Course – My favorite post because it is about my first Pluralsight course Pig Latin Getting Started. If you are interested in learning a higher level language in the Hadoop ecosystem then this course is made for you. This course allows developers with only basic SQL knowledge write MapReduce jobs without having to learn Java.
  2. Learn to Process Data with Apache Pig Latin – Number 2 on this list is about WHY YOU SHOULD LEARN PIG. Just so you know I think you should learn Pig Latin! I’m pretty biased when it comes to Pig but in this post I back my reasoning up with facts.
  3. Complete Agile Podcast – One of the most commented on post of 2015. I compiled a list of Agile Podcast from around the web. The goal was to be a resource for developers looking for Agile specific Podcast. If you see any Podcast I’ve missed let me know.
  4. Example Pig Latin Script – Another post about Pig Latin but this one focuses on the overall structure of Pig Latin script. The example script compares a Pig Latin script to SQL script line by line.
  5. Pig Latin Data Types – An in depth look at the different data types in Pig and how to cast those data types. Data types in Pig are simple so take 5 minutes to learn them.
  6. 4 Steps to Increasing Team Productivity by 400% – After finishing both Jeff Sutherland’s book and my Scrum Master certification I was stoked to write about team productivity. Team productivity can be boosted exponentially when using Scrum. Don’t believe me check the post to find out more.
  7. Apache Pig Eval Functions Series – Where else are you going to find example scripts for all the Pig Eval Functions? This is my first series post and I’m still adding more functions. If you looking for a 200 level discussion of the Pig Latin language start here.
  8. Apache Pig Latin Tutorial – Deeper look into my Pig Latin Getting Started course. This post actually includes 10 minutes of my course free.
  9. Execute Pig Script from Command Line – Most of my tutorials use Pig from the Hue Pig interface or Grunt Shell but sometimes you just need to run the script from the command line. Walk through this tutorial and run your script in under 2 minutes.
  10. How Big Data Impacts Holiday Shopping – While out shopping I started looking around at all the ways Big Data was affecting my Christmas shopping. It’s another example of how popular Big Data is in every aspect of our lives. In retail Big Data really is king.

And there you have it – my first Top 10 Favorite Blog Post ever. Hope everyone enjoys the look back at the year. 2016 is going to rock. I’ve got a plan to double the content from 2016 and incorporate more videos. Let know if you have anything you want me to cover in 2016.

Filed Under: Article Tagged With: Agile, Apache Pig, Big Data, Pig Latin, Top 10

How Big Data Impacts Holiday Shopping

December 21, 2015 by Thomas Henson Leave a Comment

Christmas is a magical time of year. I still remember the Christmas when I was 7 years old. After all the gifts had been opened my parents made me take the trash out to the road. It had been a great Christmas I was very happy with all my gifts and so I skipped off to take the trash to the road. When I was back in the house my parents said they had found another gift from Santa. I started to tear the paper away and found a Nintendo (NES). The rest of my Christmas break was spent with endless days Mario and Duck Hunt.

The Holidays are a really special time of year but for my friends in the retail business it’s make or break time. Just this week when I was shopping I couldn’t help but realize how Big Data was impacting my own Holiday shopping. For the last two months of the year retailers have days called Black Friday, Cyber Monday, or Super Saturday. Retailers who are winning big are using Big Data to help consumer get the products they want and maximize profit at the same time. Here are 3 area I really noticed the use of Big Data…..
Big Data Holidays Shopping

Price Points

Ever wondered how retailers know what price to set for those Black Friday deals or why it seems they always have something for everyone? Retailers are using Big Data to set price points. These price point algorithms are getting consumers the products they want for the highest price possible. So while it seems all the prices are rock bottom the retailer is still making a profit somewhere.

Prices points are not just set for Black Friday but throughout the season. If you like me you wait until the last minute to get that holiday shopping done. During those last days before Christmas I’ve found better deals than the Black Friday deals.

Retailers are now able to have a 360 degree view of their business.  Real time analysis is being done at chains like Walmart and Target every time an item is purchased anywhere in the world. No retailer wants to end the Holiday season with warehouses full of Christmas Tickle Me Elmo dolls. From the supply chain down to the customer, the business knows how to set the price points to maximize profit.

Loyalty Rewards Programs

What incentives can a retailer give a customer to increase the likelihood of them making a purchase? If they’ve signed up for a loyalty program the retailer has a pretty good idea.  Loyalty reward programs give retailers like Dick’s Sporting Goods or Academy Sports insight into what customers are purchasing. Along with their email, home address, and phone number. retailers then use that data along with Social Media data, credit reports, mobile data, and more to test different marketing campaigns.

A simple example is you have been put into a category of potential consumers who might interested in Star Wars. Maybe it’s because of your purchase of the Blue-ray Star Wars IV, V, and VI back 3 years ago. Now the retailer can use real-time data to send you offers for Star Wars products you want. Get ready to purchase those adult size Yodi pajamas.

Loyalty programs are a gold mine of data for retailers. Even when consumers give incomplete data about age, interest, and social data, other data sources can be used to fill the gaps. By munging data from different sources consumers are given a customized experience from their favorite retailer.

Social Media

I’ve never been crazy about signing up for loyalty reward programs. I’m always in a hurry and never wanted to carry around a card or fill out long forms. However I am pretty active on Social Media. In fact, most people have a different Social Media accounts that they have been collecting data on for years. Think about it for a second. I won’t fill out a form for a loyalty program but I’ve spent years fill out my Facebook account.

Retailers can find use information from Social Media in many ways:

  • Does this person follow/like our page?
  • What are there interest?
  • What do their friends like?
  • Where do the live?
  • Where do they check-in?
  • What’s in their browser cache?

By taking information from Social Media Retailers can send customized ads to consumers. These ads are going to be more relevant because your Social Media data has many data points.

Big Data Holidays Shopping Ads
Ads from my Instagram & Twitter accounts.

How can they use my likes?

I’m a was a huge 24 fan back when the series was going. Remember it had Kiefer Sutherland trying to stop a terrorist attack in 24 hours. Each episode represented 1 hour of real-time. So naturally I liked it on Facebook. What could a retailer do with that data? I mean 24 has long been canceled. They can use that data like Netflix does to build a profile of me. Build recommendations for new movies I might purchase or products I might like.

Social Media data represents the 360 degree view of the customer. Retailers who are using Social Media data are in a unique position to offer customized user experiences. Give the customers what they want before they know they want it, just like Steve Jobs did.

Win-Win

Whether it’s price points, loyalty reward programs, Social Media Marketing, or a combination of all, Big Data is impacting the way we are shopping this Holiday Season. In most cases it’s a win-win for both the consumer and retailer. The retailer wants to offer products their consumers want and consumer want a customized shopping experiences.

 

Filed Under: Big Data Tagged With: Big Data

Comparing Data with Pig Latin MAX() Function

November 23, 2015 by Thomas Henson Leave a Comment