Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Comparing Data with Pig Latin MAX() Function

November 23, 2015 by Thomas Henson Leave a Comment

Last time we tackled how to use the Min() function in Pig and so this week we are going to learn to use the opposite function the MAX(). It’s just like the MIN() function but instead of finding the lowest value in an array/column, it finds the maximum value. Just like with the MIN() value it’s function commonly used in Excel or SQL.

Apache Pig Latin MAX Function

For the this walk through I am going to use same problem set from MIN() function.

Table - Pig Latin Eval Function MIN

Remember in last week’s post we were looking for the lowest value from the Minnesota population stats. If we wanted to swap out the MIN() for the MAX() and now find the largest population for the age group.

MIN() Code

1
2
3
4
5
6
7
population = LOAD '/user/hue/Pig_Examples/population.csv' USING PigStorage(',') AS (year:int, age:int, gender:int, popsize:int);
 
age = GROUP population BY age;
 
lowest = FOREACH age GENERATE  group, MIN(population.popsize);
 
DUMP lowest;

MAX() Code

1
2
3
4
5
6
7
population = LOAD '/user/hue/Pig_Examples/population.csv' USING PigStorage(',') AS (year:int, age:int, gender:int, popsize:int);
 
x = GROUP population BY age;
 
largest = FOREACH x GENERATE  group, MAX(population.popsize);
 
DUMP largest;

Results

max-results

My results are now showing the largest population set for each age group. All I had to do get these results was swap out the MIN for MAX and execute the script. Easy enough to use the MAX for comparing values in Pig Latin just remember use group before using the MAX/MIN function.

Simple Right?

The MIN() and MAX() functions in Pig Latin are the easiest to use. Imagine having to calculate the maximum value each time you needed to compare a list of values that would be ridiculous.  Pig Latin is just like Hive or SQL with having the MAX function use the same syntax however the implementation is different.

Be sure to follow me on Twitter for more awesome Pig Latin tutorials and if you are looking to get the basics of Pig Latin checkout my course Getting Started: Pig Latin.

 

 

 

 

Related

Filed Under: Hadoop Pig Tagged With: Apache Pig, Apache Pig Latin, Apache Pig Tutorial, Big Data

Subscribe to Newsletter

Archives

  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Business Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Kubernetes vs. Hadoop Career Growth
  • Learning to Filtering Client Traffic in OneFS
  • O’Reilly AI Conference London 2019
  • Deep Learning Python vs. Java
  • 5 Types of Buckets in Splunk

Copyright © 2019 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in