Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Python Options in Hadoop

July 14, 2017 by Thomas Henson 1 Comment

Python Options in Hadoop

New developers in the Hadoop ecosystem often struggle to get involved because they think they need to learn Java. Where do Python and non-Java developers turn to when developing in the Hadoop eco-system? What are the Python options in Hadoop?

Learn about the Python options in Hadoop and where developers are finding resources to build Hadoop applications. Watch now to find out!

Video – Python Options in Hadoop

Transcript

(forgive any errors it was transcribed by a machine)

Hi I’m Thomas Henson with thomashenson.com and today is another episode of big data big questions today’s question is about Python options in Big Data find out more right after this
[Music]
So today’s question comes in from YouTube and if I pronounce your name wrong I’m sorry but it comes in from V gene drawn and his question is thanks for the video and thank you for watching for people who are self learners like me when we start learning to do or Apache Spark or Kafka all the examples that are available on the Internet are in Java also there’s a lack of material available for these tools using Python and so this question was posed back from when we were talking about do you have to know Java to be a big data developer and we said that you don’t have to there’s a ton of options out there to allow for you to abstract away some of the Java and so I want to break this question now in a couple different areas and a couple different parts and so the way I’m going to look at it first is if you’re just starting out in a new and in the ecosystem and you’re looking at Kafka and spark and do a lot of the examples as far as the code and the way that it’s written are going to be in Java now with spark there are some different options so spark has options for you know writing your spot jobs and Scala writing your spark jobs in Python and so it’s they have a really good documentation around that some of the other tools not so much but a lot of the tools that we’re talking about you don’t have to specifically know Java unless you want to contribute or do something outside the box two of those tools so for example they do if you want to use to do better the box to write your MapReduce jobs you’re probably going to have to do something in Java if you’re not using something like hi or something like Pig that’s going to abstract the way that and so when we use something like hive with something like Pig you’re able to do it in more of a sequel like syntax and so that kind of helps you abstract away but if you’re going to write say some custom functions in T and you want to take advantage of that most of those are going to be done your Java but there are options out there for Python as well there aren’t as many examples of those out there but there are examples out there that show you how to use those and how to write those user-defined functions in Pig for example in Python and some of the other ones you do have to dig around so it is a little bit hard when we’re talking about that but for somebody that’s just starting out it’s really an awesome opportunity just to be able to jump in start using pay or high or even just using Hadoop as the box and see some of those functions now as far as to do there are also ways to write MapReduce jobs using Python and there are a couple different options out there too but I will say and I agree with you majority of those examples that you’re going to find are going to be in Java you’re really going to have to dig around the c dos but there are options and there are ways to get around it now I will say 90% of what you’re doing when you’re just starting out – maybe even 100 % you’re not really going to need the writing need of those custom functions right you’re just trying to get a learn try to get a feel for how everything’s written and how you can start you know implementing this in your own Center and kind of you know just just doing doing your pieces now once you start getting into it a little bit a little bit further you might need to use some Java but like I said there’s still some options out there for Python and Scala and especially as we start to look at spark and I’m going to come back to the spark part now and talk about how their documentation shows you know all their examples are written with Scala and Java now the Python is still kind of being built out so the documentation there’s a lot of examples in Python they’re still there’s still a couple that need to be worked out there but if you’re doing anything with spark I mean that’s one that you can you know whether you’re a beginner with no job experience or you know seasoned Java veteran you can go in and you know start using spark look at the examples let you look at the documentation and pretty much never write any Java – now if we’re talking about contributing so this is kind of the big caveat around that so if we’re talking about contributing to these products that is totally correct you know most of those products are going to be written in Java and if you really want to be a part of you know what cop is doing or you know the source code would to do or spark you’re really going to be you know behind and that’s one thing we’re one area you’re really going to want to know to be able to do that well I hope answered your question make sure you subscribe so you never miss an episode also if you have any questions send them in send your questions in and I’ll try my best to answer them here on big data big questions thanks again [Music]

Related

Filed Under: Hadoop

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...