Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Deep Learning Python vs. Java

October 8, 2019 by Thomas Henson Leave a Comment

What About Java in Deep Learning?

Years ago when I left Java in the rear view of my career, I never imagined someone would ask me if they could use Java over Python. Just kidding Java you know it’s only a joke and you will always have a special place in my heart. A place in my heart that probably won’t run because I have the wrong version of the JDK installed. 

Python is king of the Machine Learning (ML) and Deep Learning (DL) workflow. Most of the popular ML libraries are in Python but are there Java offerings? How about in Deep Learning can you use Java? The answer is yes you can! Find the differences between Machine Learning and Deep Learning libraries in Java and Python in the video.

Transcript

Hi, folks. Thomas Henson here, with thomashenson.com, and today is another episode of Big Data Big Questions. Today’s question comes in around deep learning frameworks in Java, not Python. So, find out about how you can use Java instead of Python for deep learning frameworks. We’ve talked about it here on this channel, around using neural networks and being able to train models, but let’s find out what we can do with Java in deep learning.

Today’s episode comes in and we’re talking about deep learning frameworks that use Java, not Python. So, today the question is, “Are there specific deep learning frameworks that use Java, not Python?” First off, let’s talk a little bit about deep learning, do a recap. Deep learning, if you remember, is the use of neural networks whenever we’re trying to solve a problem. We see it a lot in multimedia, right, like, we see image detection. Does this image contain a cat or not contain a cat?

The deep learning approach is to take those images [Inaudible 00:01:10] you know, if we’re talking about supervised, so take those labeled images, so of a cat, not of a cat, feed those into your neural network, and let it decide what those features are. At the end you get a model that’s going to tell you, is this a cat or is this not a cat? Within some confidence. Hopefully not 50%, maybe closer to 99 or 97. But, that’s the deep learning approach versus the machine learning approach that we’ve seen a good bit.

We talk about Hadoop and traditional analytics from that perspective is in machine learning we’re probably going to use some kind of algorithm like singular value decomposition, or PCI, and we’re going to take these images and we’re going to look at each one and we’re going to define each feature, from the cat’s ears to the cat’s nose, and we’re going to feed that through the model and it’s going to give us some kind of confidence. While the deep learning approach we get to use a neural network, it defines some of those features, helps us out a lot. It’s not magic, but it is a little bit, so really, really innovative approach.

So, the popular languages, and what we’ve talked most about on this channel and probably other channels and most of the examples you’ve seen are all around Python, right? I did do a video before where I was wrong on C++. There was more C++ in deep learning than I really originally thought. You can check that video out, where we kind of go through and talk about that and I come in and say, “Hey, sorry. I missed the boat on that one.” But, the most popular language, one… I mean, I did a Pluralsight video on it, Take CTRL of Your Career, around TensorFlow and using TFLearn. TensorFlow is probably far and away the most popular one. You’ve seen it with stats that are out there. Also PyTorch, Caffe2, MXNet, and then some other, higher-level languages where Keras is able to use some of TensorFlow and be a higher-level abstraction, but most of those are going to use Python and then some of them have C++. Most examples that you’re going to see out there, just from my experience and just working in the community, is Python. Most people are looking for those Python examples.

But, on this channel, we’ve talked a lot about options and Hadoop for non-Java developers, but this is an opportunity where all you Java developers out there, you’re looking for, “Hey, we want to get into the deep learning framework. We don’t want to have to code everything ourselves. Are there some things that we can attach onto?” And the answer is yes, there are. It’s not as popular as Python right now, or R and C++ in the deep learning frameworks, but there is a framework called Deeplearning4j that is a Java-based framework. The Java-based framework is going to allow for you to use Java. You could still use Python, though. Even with the framework, you can abstract away and do Python, but if you’re specifically a Java developer and looking to… I mean, maybe you want to get in and contribute to the Deeplearning4j community and be able to take it from that perspective, or you’re just wanting to be able to implement it in some projects. Maybe you’re like, “Hey, you know what? I’m a Java developer. I want to continue doing Java.” Java’s been around since ’95, right? So, you want to jump into that? Then Deeplearning4j is the one for you.

So, really, maybe think about why would you want to use a Java-based deep learning framework, for people that maybe aren’t familiar with Java or don’t have it. One of the things is it claims to be a little bit more efficient, so it’s going to be more efficient than using an abstraction layer from that perspective in Python. But also, there’s a ton of Java developers out there, you know, there’s a community. Talked about how it’s been around since ’95, so there’s an opportunity out there to tap into a lot of developers that have the skills to be able to use it and so, there’s a growing need, right? There’s communities all around the globe and different little subsets and little subareas. Java’s one of those.

I mean, if you look at what we did from a Hadoop perspective, so many people that were Java developers moved to that community, also a lot of people that didn’t really do Java. It’s a lot like, like I said, at the point I was at in my career, I was more of a .NET C# developer. Fast forward to getting into the Hadoop community, went back to my roots as a Java, so I’d done some Java in the past, and went through that phase. And so, for somebody like me, maybe I would want to go back out. I don’t know. I’ve kind of gone through more Python, but a lot of different options out there. Just being able to give Java developers a platform to be able to get involved in deep learning, like, deep learning is very popular.

So, those are some of the reasons that you might want to go, but the question is, when you think about it, so if I’m not a Java developer, or what would you recommend? Would you recommend maybe not learn TensorFlow and go into Deeplearning4j? You know, I think that one’s going to depend… I mean, we say it a lot in here. It’s going to depend on what you’re using in your organization and what your skill set is. If you’re mostly a Python person, my recommendation would be continue on or jump into the TensorFlow area. But if you’re working on a project that is using Deeplearning4j then by all means go down that path and learn more about it. If you’re a Java developer and you want to get into it, you don’t want to transition skills or you’re just looking to be able to test something out and play with it, and you don’t want to have to write it in Python, you want to be able to do it in Java, yeah, use that.

These are all just tools. We’re not going to get transfixed on any tool. We’re not going to go all in and say, “You know what? I’m only going to be a Java developer,” or, “I’m only going to be this.” We’re going to be able to transition our skills and there’s always going to be options out there to do it. And in these frameworks too, right? Deeplearning4j is awesome, but maybe there’s another one that’s coming up that people would want to jump into, so like I said, don’t get so transfixed with certain frameworks. Like, Hadoop was awesome. We broke it apart. A lot of people navigated to Spark and still use HDFS as a base. There’s always kind of skills that you can go to, but if you go in and say, “Hey, I’m only going to ever do MapReduce and it’s always going to be in Java,” then you’re going to have some challenges throughout your career. That’s not just in data engineering, that’s throughout all IT. Heck, probably throughout all careers. Just be able to be flexible for it.

So, if you’re a Java developer, if you’re looking to test some things out, definitely jump into it. If you don’t have any Java skills and it’s not something that you’re particularly wanting to do, then I don’t recommend you running in and trying to learn Java just for this. If you’re doing Python, steady on with TensorFlow, or PyTorch, or Caffe, whatever you’re using.

So, until next time. See you again on Big Data Big Questions. Make sure you subscribe and ring that bell so you never miss an episode. If you have any questions, put them in the comment section here below. Thanks again.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: Deep Learning Tagged With: Deep Learning, Java, Python, Tensorflow

Learn HDFS Without Java?

June 3, 2019 by Thomas Henson 1 Comment

Learn HDFS without Java

HDFS Skills Without Java

In the world of Hadoop and Big Data HDFS is king. Data Engineers looking to boost their administrative skills first learn to navigate the Hadoop Distributed File System (HDFS) before jumping to more complex tasks. If Hadoop is written in Java does that require knowing Java Programming for HDFS. In this video I breakdown what HDFS is and how to learn it without needing to know Java. Find out more by watching this episode of Big Data Big Questions.

Transcript – Learn HDFS Without Java?

Hi folks, Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today’s question came in from a live session. If you’re not familiar, I do a live session sometime on the weekends, and I’m thinking about incorporating another one. If you’d like to be a part of those, make sure you check it out. I’ll post those. Also, let me know if there’s a better time for me to do these. If you’d like to see maybe a Wednesday night or a Tuesday night episode, let me know. Put them in the comments section here below, and also if you have a question, go ahead and throw them down just like this one.

This one came out from my live session. One of the last questions I actually dropped off. As I dropped off, this question came in, so I wanted to make sure that I was getting this one done and out there. the question comes in. It’s can you learn HDFS without Java? This question is a little bit similar to some of the other ones that I answered and talked around Hadoop. MapReduce, and can you do Hadoop, or MapReduce, or Spark without Java? This one takes a twist a little bit on more of the administrator side. I feel like we’re talking about, we’ve discussed the difference between the Hadoop or developer, when we’re talking big data developer versus big data administrator. This one is more around the administration. I’ve said before, on the other ones, where can you do Spark? Can you do MapReduce without Java? It was always, hey, it depends. You absolutely can, but there might be an instance where you need to import or have somebody that’s already using that. For this one, no Java.

You’re cleared. You don’t have to worry about that, and one of the reasons is, if you think about it from an administrator perspective, really what we’re trying to do is, we’re trying to go through and be able to move data around, and understand some of the other tasks, like updates, what we’re doing. I did a whole course around HDFS from the command line. You can go through that course and never do anything around Java-related. It’s pretty cool to be able to go in and do that. Talk more configuration files like what we’re trying to do from that perspective. No worries. No need for java to be able to do HDFS. From a high level, let’s look at some HDFS commands and understand what we’re talking about whenever we’re saying, “Hey, no need for Java.” Then also, more of a need for Linux. If we look here from the command line, one of the things that you can do is go through and look at what we’re doing from an HDFS perspective. All these commands that we’re going to do are HDFS DFS commands. If you look at doing HDFS DFS, just to list out the files that you have here, you’ll use this HDFS DFS LS. This command will take you through, and it’ll show you everything that’s in a directory, right? We’re looking at files that we have in this directory here, and it’s really similar to what we would do if you just logged in to your favorite version of Linux and did LS from the command line.

A lot of these commands are all going to be the same. I actually have a course, like I said, that’ll dig through and go through all these different commands, but look at this command here, too. HDFS, DFS, MKDR. What do you think we’re doing here? If you have a background in Linux, you understand that we’re just making directories. Lastly, some of the things that you’ll also want to have from a Linux perspective that will help you in HDFS are these permissions. How can you be able to be ensured that Bob doesn’t have access to a file that he doesn’t need to have access to or that the HDFS user is allowing other users to be able to create files? That’s where we talk about permissions. Like I said, this is similar. What we do from a Linux perspective, but I have a course that’s all around this, if you’re interested in checking it out, but these are some of the commands and some of the skills that you’ll need to be an HDFS administrator. I’ve also got some other resources that I’ll put in the description here that’ll walk through some quick tutorials that you can walk through, and start using. All the commands that you need to know, like I said, it’s nothing that you need to recite. I actually created some of these blog posts that you’ll see, just because I couldn’t remember some of the commands. Like I was saying, mostly from a Linux perspective, but no need to worry. No need to jump in about, “Man, how am I going to learn Java if I want to be an HDFS administrator,” or start working in HDFS? Totally able to do that, and you can see it here just as simply as how we were able to jump in and do it. If you’re looking to be able to jump in and do some of the commands like we just showed, just go out and download one of the sandboxes or set up just a Hadoop environment on your own. This gives you the ability to play with it in your own lab and start building out some of those other requirements. Now, thanks for tuning in. Thanks for the question. If anybody has a question, make sure you put them in the comments section here below. I’ll try my best to answer these as we see on another episode of Big Data Big Questions.

Filed Under: Hadoop Tagged With: Hadoop, HDFS, Java

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...