Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Apache Pig Eval Functions Series

July 27, 2015 by Thomas Henson 5 Comments

Ready to master the Apache Pig but not sure how to get started?

How can I master Apache Pig?

The process for mastering a programming language is that same as learning any other skills. Practice, Practice, Practice.

The practice needs to be focused and using different scenarios to be effective. Performing the same task over will not get you to the mastery level. However using Pig functions you haven’t used before to process real world examples is the kind of practice needed to master Pig Latin.

Imagine a race car driver trying to become an elite racer. He will practice racing on different tracks and even practice specific scenarios he might see in a race. Now let’s apply that logic to a Pig developer, you need to practice using Pig to solve real world problems and using new functions.

Pig eval functions deep dive

In this Apache Pig Eval Function series I have packages together different data sets and a series of quick scenarios for developers to solve using Pig. Each post in this series will focus on a single Pig Eval Function, defining the function and then using that function in a real world example. All the code and data are provided for each of these examples making your journey to becoming a Apache Pig master easier.

Looking for a complete way to master the basics of Pig? Try my Pluralsight course Pig Latin:Getting Started.

Why Eval Functions?

Pig ships with many different built-in functions and learning these functions can save you time. The eval or evaluation functions are a group of functions that you will typically not learn when you first start out with Pig. When we think about evaluations functions the first thing that comes to mind are mathematical functional like addition or subtraction, but Pig Eval Functions contain useful string functions as well. Concatenating a string is one of the most useful Pig Eval string functions, think about trying to merge two fields such as first and last name. The Pig Eval Functions has a CONCAT() function built-in with the standard Pig build.

The code and data can be found at my Example Pig Script Github page.

Deep dive for the Apache Pig Eval Functions:

  • Average – Learn how to get the average of a column using the AVG() function. 
  • Sum – Get the sum of a column using the SUM() function. 
  • Concatenation – Merge two or more columns together using the CONCAT() function.
  • Tokenize – Find out how to breakdown fields using the TOKENIZE() function.
  • Min –  Using the MIN() function over data in Pig
  • Max – Master using the MAX() function in Pig Latin.
  • Count – Take the population data from previous tutorials and use the COUNT() function.

Related

Filed Under: Hadoop Pig Tagged With: Apache Pig, Apache Pig Latin, Hadoop, Pig Eval Series

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...