Last time we tackled how to use the Min() function in Pig and so this week we are going to learn to use the opposite function the MAX(). It’s just like the MIN() function but instead of finding the lowest value in an array/column, it finds the maximum value. Just like with the MIN() value it’s function commonly used in Excel or SQL.
For the this walk through I am going to use same problem set from MIN() function.
Remember in last week’s post we were looking for the lowest value from the Minnesota population stats. If we wanted to swap out the MIN() for the MAX() and now find the largest population for the age group.
MIN() Code
1 2 3 4 5 6 7 |
population = LOAD '/user/hue/Pig_Examples/population.csv' USING PigStorage(',') AS (year:int, age:int, gender:int, popsize:int); age = GROUP population BY age; lowest = FOREACH age GENERATE group, MIN(population.popsize); DUMP lowest; |
MAX() Code
1 2 3 4 5 6 7 |
population = LOAD '/user/hue/Pig_Examples/population.csv' USING PigStorage(',') AS (year:int, age:int, gender:int, popsize:int); x = GROUP population BY age; largest = FOREACH x GENERATE group, MAX(population.popsize); DUMP largest; |
Results
My results are now showing the largest population set for each age group. All I had to do get these results was swap out the MIN for MAX and execute the script. Easy enough to use the MAX for comparing values in Pig Latin just remember use group before using the MAX/MIN function.
Simple Right?
The MIN() and MAX() functions in Pig Latin are the easiest to use. Imagine having to calculate the maximum value each time you needed to compare a list of values that would be ridiculous. Pig Latin is just like Hive or SQL with having the MAX function use the same syntax however the implementation is different.
Be sure to follow me on Twitter for more awesome Pig Latin tutorials and if you are looking to get the basics of Pig Latin checkout my course Getting Started: Pig Latin.