Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Archives for July 2018

Defining IoT Message Brokers

July 8, 2018 by Thomas Henson Leave a Comment

IoT Message Brokers

How Do IoT Message Brokers Work?

What are message brokers in IoT? Message brokers are the middle ware in IoT & Streaming applications. Think of these systems as queuing systems that allow for quick writes to one system that can be read by many applications.

Message broker are critical to IoT & Streaming analytics to give Data Engineer the ability to quickly move data/messages into a storage container. Once in those storage containers the data can be read by multiple sources.

In this video we will walk through the different open source message brokers in IoT & Streaming workflows.

Video

Transcript – Defining IoT Message Brokers

Hi, folks! Thomas Henson here, with thomashenson.com, and today is another episode of Big Data Big Questions. Today’s question comes in, I want to talk more about IoT, like I was talking about in the last few videos on IoT, this is something huge. This is something I think that a lot of data engineers should really start digging into. These are going to be workloads that we’re going to see, and even modern application developers, making you don’t do big data, you’re going to be impacted by this. I want to talk about message brokers in IoT. I want to talk about what a message broker is, how that architecture works, and then also some of the major players in there. You’ve probably heard of a few of these, but find out more right after this.
Welcome back. Thanks for tuning in. Today, we’re going to dig into message brokers in IoT, so really want to talk about how message brokers work in IoT and how, really, the data push. It’s a little bit different, right? It’s not your traditional application. In IoT, your devices are out there with IP connections, and may have a spotty connection, but how do you ensure that you can bring the data back in? This is where we start to see message brokers being used.

Message brokers are the middleware that’s in distributed applications, and so it’s like a queueing system. It’s going to handle the message validation transformation and the routing of the messages. It allows for you to move in. Think about, if you have a Raspberry Pi set up for your garage, so every time your garage doors open, you send a message out, and it sits in a queue to your message broker where, you can know that, “Hey, you know, that garage door now?” It was in an open state, but now it’s in a closed state, or vice versa. Then, if it’s in an open state, maybe then I’ve got a message, I’ve got somebody that subscribed to it to turn my air down, because chances are, if my garage door is open, it means that I’m going to be home, or I’m just pulling in, so I want you to go ahead and kick that air down for me.
[adsense_hint]
The architecture behind these message brokers is normally going to be a publish-subscribe. This gives you the ability to, your IoT devices, they’re going to publish updates. Like I said, they may have a spotty connection, so this is important for them, to be able to send those out. It’s constantly not sending out a, “I’m open, I’m open, I’m open.” It’s going to send it out whenever that’s changed, whenever it has a connection to change. If you don’t have a connection for your garage door opener, if it changes from open to close, it’s still, in that message broker, going to be shown as being closed. Once that connection hits back up, it’s going to change it to be open. This gives you the ability to, one, work with non-persistent data. Work in locations where you’re not going to have such a great connection, but this is also going to give you the ability to have multiple subscribers. You can have multiple subscribers. I talked about the air conditioner working, but what about other applications?

What if I wanted to have certain lights that came on? What if I wanted to have multiple different subscribers, or different applications, or different, other IoT devices that are looking and keying off of what happens to that garage door from that Raspberry Pi?

That’s just a little bit about that publish-subscribe pattern. Probably do another video digging a little bit deeper, maybe throw up some slides on it, but I did want to talk a little bit about what some of the message brokers in IoT are. First one I want to talk about it Apache Kafka. Kafka was incubated and developed outside of LinkedIn, so they were looking for ways that they could be able to take in all these messages and have them in a queueing system. We think about what they were doing, we’re talking about millions and millions of messages, right? For many years, it was used in their production. You see a lot used in streaming analytics. You’ve heard me talk about Kafka in the Lambda architecture and being able to support that streaming analytics, and have that queueing system. As those messages come in, you just don’t have time for them to hit HTFS right then. That gives you the ability. Another one is Pravega. Pravega is open source out of Dell EMC. Heard me talk about it when we talk about the Kappa architecture. This gives you the ability to have that messaging queue for those devices as they come in. They’re sitting in that queue system, but because it’s part of the Kappa architecture, even your batch rights and your streaming rights can all be accessed through Pravega, versus typically when we talk about a Lambda architecture, we have our, think about it in our batch layer. We have our batch layer, traditional probably going to be in HTFS. Then, you have your stream-in layer that might be in Kafka, or Spark Streaming, or some of the other applications.

Then, you have two different code bases to be able to do that. Pravega, built from the ground up for streaming architecture, but also giving you that ability to really take advantage of the Kappa architecture, and be able to have one code base to be able to use, to be able to access, and write your, whether it be Spark jobs, or it be old MapReduce jobs, those types of things. Then, the third one that I wanted to talk about was RabbitMQ. Another message broker in IoT is RabbitMQ. Widely developed for web development, so it was originally developed for web services to be able to respond to a call request, and so if you think about, and you look at a lot of the frameworks that it supports, and a lot of the code levels, we’re talking still Ruby, PHP, .NET, a lot of the development stack, even a lot of JavaScript. I’ve seen some people who have some courses out there on RabbitMQ, just for the JavaScript developer. It’s another one that’s kind of a message queueing system, built to be able to stream, built to be out for streaming analytics, and be able to distribute those messages.

Still not seen or as popular as Kafka as far as when we start talking about big data analytics. You’re starting to see a little movement from that area, and then also, there are other ones out there with Azure having one, and AWS IoT, they use a publish-subscribe in their architecture for their IoT platform. There’s a lot of different ways to use those message brokers. I think this is a concept that you really should be familiar with to some extent, because you’re probably already using one, you maybe just haven’t referred to it as a message broker.

That’s all I have for today. Make sure you subscribe to the YouTube channel, here. You never want to miss an episode, and this gives you an opportunity to ask questions, submit them down here in the comments section below, but always stay tuned, make sure to keep your big data, data engineering knowledge on point.

Thanks again.

IoT Message Broker Show Links

Apache Kafka  – https://kafka.apache.org/
Pravega – http://pravega.io/ 
RabbitMQ – https://www.rabbitmq.com/ 

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: IoT Tagged With: IoT, Kafka, Message Brokers

Is an AWS Certification Required for Data Scientist?

July 5, 2018 by Thomas Henson 1 Comment

AWS Certification Required for Data Scientist

Will AWS Certification Help Data Scientist?

Every discipline in IT has different certification and the debate about the worth of those certification will go on forever. Data Scientist cross over with needing skills in coding, operations, and math. However the Data Scientist isn’t the only person on the Big Data Team. The Data Engineer tends to build and maintain the application, leaving the data modeling to the Data Scientist. With the division of labor should a Data Scientist get an AWS Certification?

In this video I will explore the requirements for Data Scientist and even break down a job posting from AWS for a Data Scientist. Watch now to find out about AWS certifications for Data Scientist.

Transcript – AWS Certification Required for Data Scientist?

Hey, how are you doing today? My name’s Thomas Henson, and welcome to another episode of Big Data Big Questions. And so, today I’ve got a very special question that came in from a user that we’re going to tackle. It’s about a certification in AWS. So, do you need a Certification AWS to be a data scientist? And so, I’m going to tackle that question. And then, we’ll also, actually, going to look and try to see some job descriptions out there that are posted and see what those job descriptions are, and how I would approach it, and where my thoughts are on the AWS Certification for data scientists, and looking all into the job description, too. So, find out all about this, right after this.

So, today’s question comes in from YouTube. So, I’ve got the question here. Before we start and jump into the question, I do want to remind you, if you have any Big Data Big Questions and you would like them answered, put them in the comments section here below, throw it out on Twitter using the hashtag Big Data Big Questions, or go to our website and you can send me any kind of question that you want and I’ll try my best to answer them here for you. Also, make sure you subscribe and hit the notification button so you never miss an episode and never miss when your question gets answered all here on YouTube.

So, today’s question comes in from… It’s on my Cloudera Data Engineering Certification question. So, we’re following along with certification questions here. So, he says, “Hi, will AWS Certified Solutions Architects Certification help in my data science career path?” So, this question is a large, large topic, right? So, I’m going to have to take some assumptions here and think, “Okay, so this person is looking for a career path into data science.” So, I’m thinking that they want to become a data scientist. And so, what they’re saying is, “Hey, to become a data scientist, do I need to have the base level AWS Certification?” The quick and dirty answer is no, but that’s all going to depend, too. So, it’s going to depend on what the job description is. And towards the end of this video, will actually go through and look at a job description specifically from Amazon and see, does even Amazon require AWS Certification for their data scientist?

So, jumping back into the question though, let’s assume that we’re not talking specifically just about becoming an AWS Certification for a data science career path. Let’s say that it’s a more broad topic. Maybe, it’s going to be a data engineer or a machine learning engineer. So, with those topics, remember, those are more hands-on as far as the technology and implementing different packages, whether it be HDFS, Yarn, Kafka, Pig, Hive, doing some of the systems administration work, but also doing some of the hands-on a machine learning work where you get to maybe implement some of the algorithms and doing the tuning and coding there.

So, in those career paths, do you have to have the AWS Solutions Architects Certification? It’s going to depend there. So, the first thing I would do is, would find out what the basis of wanting to get that certification is. So, if you’re data engineer or a machine learning engineer and you know that you’re… Say, within your company, you guys are using AWS, you’re using AWS for your big data projects, then it’s probably going to make sense for you to have some level of understanding of AWS platform. And specifically, if you’re at a company where you’re required to get the AWS Big Data Certification, then yes, getting this lower level certification for the Solutions Architect Associate, that’s going to benefit you greatly because AWS now requires for the AWS Big Data speciality, that you have a baseline certification. The Solutions Architect is one that’ll get you covered.

I will say that I have the AWS Solutions Architect Certification Associate. I was looking into the Big Data Certification there for AWS and doing some of the tactical things there. It was a great certification to give you those baseline skills because with my skill set, I came in, didn’t really have an understanding of all the offerings for AWS. Most of the stuff that I’ve always worked with is On-Prem. Working at a company that’s using AWS or knowing that you’re applying for a position that requires that AWS Certification, I’m going to say that most the time you’re not really going to have to have that AWS Certification. A lot of deployments… And you can look with… Hortonworks and Cloudera, a lot of their different deployments if you look, they’re overarchingly On-Prem. So, not saying that it’s going to hurt you for AWS, but if you’re trying really quickly, and this is a tactical decision to, within the next six months be able to roll into a position, odds are in your favor that you’re not going to deploying it in AWS, or, Azure, or even Google at this point.

So, I would look into maybe getting the Cloudera Certification or getting Hortonworks and just having that baseline information for the machine learning engineer, for your data engineer. Especially, if you’re data scientist, I don’t think that you’re going to need that.

So, let’s go back in and actually look at the question about data scientists and see what the job description is there. So, for this, let’s look… I’ve just pulled in here an Amazon data scientist position. This looks like it’s an Alexa position. So, you can see what they’re looking for is somebody that’s probably going to be able to do some kind of machine learning, maybe some deep learning on voice recognition as it comes in from Alexa and be able to provide some kind of prescriptive, or maybe even predictive analytics on it. But you can see, the majority of what they’re asking for is, they’re looking for… Yeah, they’re looking for some scripting languages here, so, maybe, some Perl or Python, or just some familiarity with those.

But it’s real heavy in the high-level techniques, right? Like what are we doing with machine learning, like building up those models and specifically really having more math-based skills? If you even look at the description here, and when we talk about the technical degree, they’re not really looking for a technical degree like we would think about the computer science, information system, management information system, computer information systems. They’re saying, “Hey, it’s okay, if you have a statistics background, some kind of applied math, or even an economics background.” And so, this right here, just looking at this one, so this is Amazon, right?

So, at Amazon, not saying that they’re not using AWS platform, but I’m saying, for a data scientist, and if that’s specifically the role that you’re looking at, necessarily, you’re not going to have to have that AWS Certification. You probably want to be somewhat familiar if you’re applying at Amazon. But outside of that, I wouldn’t think you’d need the certification. Even Amazon’s not asking for it here.

So, that’s my two cents on the AWS Certification in data science. If you have any questions, any follow-up questions, go ahead and put them in the comments section below. Make you subscribe so you never miss an episode, and I will see you next time.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Filed Under: AWS Tagged With: AWS, Certifications, Data Scientist

DynamoDB Data Types

July 2, 2018 by Thomas Henson Leave a Comment