IoT Archives - Thomas Henson

Defining IoT Message Brokers

July 8, 2018 by Thomas Henson Leave a Comment

How Do IoT Message Brokers Work?

What are message brokers in IoT? Message brokers are the middle ware in IoT & Streaming applications. Think of these systems as queuing systems that allow for quick writes to one system that can be read by many applications.

Message broker are critical to IoT & Streaming analytics to give Data Engineer the ability to quickly move data/messages into a storage container. Once in those storage containers the data can be read by multiple sources.

In this video we will walk through the different open source message brokers in IoT & Streaming workflows.

Video

Transcript – Defining IoT Message Brokers

Hi, folks! Thomas Henson here, with thomashenson.com, and today is another episode of Big Data Big Questions. Today’s question comes in, I want to talk more about IoT, like I was talking about in the last few videos on IoT, this is something huge. This is something I think that a lot of data engineers should really start digging into. These are going to be workloads that we’re going to see, and even modern application developers, making you don’t do big data, you’re going to be impacted by this. I want to talk about message brokers in IoT. I want to talk about what a message broker is, how that architecture works, and then also some of the major players in there. You’ve probably heard of a few of these, but find out more right after this.
Welcome back. Thanks for tuning in. Today, we’re going to dig into message brokers in IoT, so really want to talk about how message brokers work in IoT and how, really, the data push. It’s a little bit different, right? It’s not your traditional application. In IoT, your devices are out there with IP connections, and may have a spotty connection, but how do you ensure that you can bring the data back in? This is where we start to see message brokers being used.

Message brokers are the middleware that’s in distributed applications, and so it’s like a queueing system. It’s going to handle the message validation transformation and the routing of the messages. It allows for you to move in. Think about, if you have a Raspberry Pi set up for your garage, so every time your garage doors open, you send a message out, and it sits in a queue to your message broker where, you can know that, “Hey, you know, that garage door now?” It was in an open state, but now it’s in a closed state, or vice versa. Then, if it’s in an open state, maybe then I’ve got a message, I’ve got somebody that subscribed to it to turn my air down, because chances are, if my garage door is open, it means that I’m going to be home, or I’m just pulling in, so I want you to go ahead and kick that air down for me.
[adsense_hint]
The architecture behind these message brokers is normally going to be a publish-subscribe. This gives you the ability to, your IoT devices, they’re going to publish updates. Like I said, they may have a spotty connection, so this is important for them, to be able to send those out. It’s constantly not sending out a, “I’m open, I’m open, I’m open.” It’s going to send it out whenever that’s changed, whenever it has a connection to change. If you don’t have a connection for your garage door opener, if it changes from open to close, it’s still, in that message broker, going to be shown as being closed. Once that connection hits back up, it’s going to change it to be open. This gives you the ability to, one, work with non-persistent data. Work in locations where you’re not going to have such a great connection, but this is also going to give you the ability to have multiple subscribers. You can have multiple subscribers. I talked about the air conditioner working, but what about other applications?

What if I wanted to have certain lights that came on? What if I wanted to have multiple different subscribers, or different applications, or different, other IoT devices that are looking and keying off of what happens to that garage door from that Raspberry Pi?

That’s just a little bit about that publish-subscribe pattern. Probably do another video digging a little bit deeper, maybe throw up some slides on it, but I did want to talk a little bit about what some of the message brokers in IoT are. First one I want to talk about it Apache Kafka. Kafka was incubated and developed outside of LinkedIn, so they were looking for ways that they could be able to take in all these messages and have them in a queueing system. We think about what they were doing, we’re talking about millions and millions of messages, right? For many years, it was used in their production. You see a lot used in streaming analytics. You’ve heard me talk about Kafka in the Lambda architecture and being able to support that streaming analytics, and have that queueing system. As those messages come in, you just don’t have time for them to hit HTFS right then. That gives you the ability. Another one is Pravega. Pravega is open source out of Dell EMC. Heard me talk about it when we talk about the Kappa architecture. This gives you the ability to have that messaging queue for those devices as they come in. They’re sitting in that queue system, but because it’s part of the Kappa architecture, even your batch rights and your streaming rights can all be accessed through Pravega, versus typically when we talk about a Lambda architecture, we have our, think about it in our batch layer. We have our batch layer, traditional probably going to be in HTFS. Then, you have your stream-in layer that might be in Kafka, or Spark Streaming, or some of the other applications.

Then, you have two different code bases to be able to do that. Pravega, built from the ground up for streaming architecture, but also giving you that ability to really take advantage of the Kappa architecture, and be able to have one code base to be able to use, to be able to access, and write your, whether it be Spark jobs, or it be old MapReduce jobs, those types of things. Then, the third one that I wanted to talk about was RabbitMQ. Another message broker in IoT is RabbitMQ. Widely developed for web development, so it was originally developed for web services to be able to respond to a call request, and so if you think about, and you look at a lot of the frameworks that it supports, and a lot of the code levels, we’re talking still Ruby, PHP, .NET, a lot of the development stack, even a lot of JavaScript. I’ve seen some people who have some courses out there on RabbitMQ, just for the JavaScript developer. It’s another one that’s kind of a message queueing system, built to be able to stream, built to be out for streaming analytics, and be able to distribute those messages.

Still not seen or as popular as Kafka as far as when we start talking about big data analytics. You’re starting to see a little movement from that area, and then also, there are other ones out there with Azure having one, and AWS IoT, they use a publish-subscribe in their architecture for their IoT platform. There’s a lot of different ways to use those message brokers. I think this is a concept that you really should be familiar with to some extent, because you’re probably already using one, you maybe just haven’t referred to it as a message broker.

That’s all I have for today. Make sure you subscribe to the YouTube channel, here. You never want to miss an episode, and this gives you an opportunity to ask questions, submit them down here in the comments section below, but always stay tuned, make sure to keep your big data, data engineering knowledge on point.

Thanks again.

IoT Message Broker Show Links

Apache Kafka – https://kafka.apache.org/
Pravega – http://pravega.io/
RabbitMQ – https://www.rabbitmq.com/

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Phases of IoT Application Development

June 19, 2018 by Thomas Henson Leave a Comment

IoT Application Development

The Internet of Things is generating many opportunities for Data Engineers to develop useful applications. Think about self driving cars, they are just one large moving IoT devices. When developing IoT applications developers typically start with 3 different phases in mind. In this video I will explain the 3 Phases of IoT Application Development.

Transcript – Phases of IoT Application Development

Hi folks, Thomas Henson here with thomashenson.com, and today is another episode of Big Data Big Questions. And so, today’s episode, I wanna talk more about IoT. So, I know we started talking about it in the previous video, but I really wanna dig into it a little bit more because I think it’s something very important for data engineers and anybody involved in any kind of modern applications or anybody involved in Big Data. So, find out about the phases, the three phases of IoT right after this.

Welcome back. And so, today, we’re gonna continue to talk more about IoT or Internet of Things. And so, if you’re not familiar, I’ve got a video up here where we talked about why it’s important for data engineers to know, but I think it’s important for anybody involved in modern applications or even on the business side of things. You’re really gonna see a lot of different information that’s being able to come into your data center and into your projects because of these sensors out there. So, let’s get a little more comfortable with what’s going on with the technology and how that’s gonna implement to us.

And so, in this video, I wanna talk about the three phases. So, I think there are three phases of IoT and I think we’re starting to get into the third phase, and you’ll see why it’s gonna make sense for data engineers and modern applications when we talk about that third phase.

And so, just as a recap, remember, IoT, it’s not a new concept, it’s the ability to have devices. So, we have devices out in the physical world that are gonna have some kind of IP address, but us also be able to send data and receive data back from your core data center or from your core analytics processing. So, think about the example I’ve used before is the dash button, right? You have a dash button where if you’re out of toilet paper or if you’re out of whatever it is in your house, you’re able to push that button. It connects out to a gateway, it’s locked in the cloud with Amazon to be able to say, “Hey, order some more [INAUDIBLE 00:02:02] of this particular brand,” and sent it to your door, so a real quick example there.

But let’s talk a little bit more about the phases and I think that will give you a more understanding of, “Okay, this is how that concept and how that whole ecosystem of devices and data and gateways all work together.” And so, I think just like from a web development perspective when we talked about, hey, Web 1.0 and Web 2.0, I think with IoT, we’ve gone through phases of IoT 1.0 and 2.0. I think that these phases are more collapsed than the phases of the web, and part of that is just we change so fast.

And so, the first phase was everybody had a sensor, right? Maybe this is not a smartwatch, but think about tracking and the smartwatches, everybody is like, “Oh, it’s kinda cool, right? I can get in contest with my friends and track how many steps I have.” That’s pretty cool. Really, we didn’t understand what to do with it. It was still kind of somewhat of a novelty and so everybody who already had this since just really didn’t know what to do with them other than just tracking, right? That’s kinda things that we had been doing before, but now we have an internet connection and we can kinda control them on our phone.

Fast-forward into phase two, so once we go into phase two, it wasn’t just about these smart trackers and these devices that were attached to us, but it started to become… we had smart everything in our homes, right? So, in phase two, we started having, think of a refrigerator, so you had a smart refrigerator, and people are like, “Well, that’s kinda cool. We have a refrigerator that’s connected to the internet. I can look at photos of it from my phone. And so, if I’m at the grocery store, that’s like, hey, do I need any more ranch dressing or do I need any more Tide Pods…” well, maybe Tide Pods are in your refrigerator, but, well, hopefully not. But if you had pickles or things that you’re looking for at the grocery store or maybe even just your washing machine, you’re able to turn your washing machine on with your device and say, “Okay, let’s turn the washing machine on.” That’s pretty cool. You set it up.

It’s still kinda novel, not really where we’re really seeing this go because we know as data engineers and people that analytics and being able to predict and being able to prescribe outcome is where it really goes, and that’s where phase three is. So, phase three, and that’s where we’re really entering right now.

Phase three is when we’re able to take all this information. So, think of that washing machine. We’re not just turning that washing machine on from our phone. That device has diagnostics in it that’s gonna run, run those diagnostics. And so, let’s say that there’s an air in the onboard or maybe some kinda circuit, but maybe just a $10 component on your washing machine that if you replace it within the next 30 days, it’s gonna prevent you from having to get a brand new washing machine. Well, that’s pretty cool, right? That’s really cool. So, it can send you that information, but instead, it’s just sending you that information, check this out. So, that diagnostics that happens goes out, sends that information out to the data center, the data center actually looks and it finds service providers in your area because it knows where this device is, you’ve registered it. It knows where your home is. So, it’s gonna find those service providers in your area and it’s gonna send you back in an alert saying, “Hey, we found this component that needs to be replaced on your washing machine. This is gonna prevent you from having to buy a new washing machine, maybe it will prevent you from having a flood washing machine,” which man, who wants to clean up and then have to buy a new washing machine?

So, how about these are some times that we’ve scheduled, we have a service person that can come in your area and replace that part for you, when would you like to schedule that? That’s pretty awesome, right? That’s really starting to say, “Hey, we found an error. We believe this is the component that can fix it.” And then, also, here are some times for us to be able to fix it. So, how many steps did it take a human on the [INAUDIBLE 00:05:50]? Which is really good, right? From a consumer, we want products like that, right? And so, there’s a ton of different new cases that we can start to see. So, we’re starting to see that now with what I call IoT phase three.

So, the phases… just remember the phases. The first one, think of sensors everywhere, smart sensors, but we’re really just tracking things. Second became like mobile control or the ability to have smart everything, so we have the smart refrigerator, we have the smart washer and dryer, but we still just didn’t know what we could do with it. And now, we’re more into the phase three. We’re starting to prescribe, so we’re starting to have these predictive analytics saying, “Hey, these are things that might happen. Oh, and by the way, this is how we can fix it.” And, this is actually gonna give consumers and other products more information, and just a better feeling for the things. And so, it can save you time from having to pick up the phone and call to schedule a time for somebody to come in and fix your washing machine. It’s gonna prevent you from having to go out and buy a new washing machine. It makes products more sticky for those companies.

So, that’s all for today’s episode of Big Data Big Questions. Make sure to subscribe to the channel, submit any questions that you have. If you have any questions that’s related to Big Data, IoT, machine learning, hey, just send me any questions, I’ll try to answer them for you if I get an opportunity. But submit those here or go to my website. Make sure you subscribe and I’ll see you next time on the next episode of Big Data Big Questions.

Want More Data Engineering Tips?

Sign up for my newsletter to be sure and never miss a post or YouTube Episode of Big Data Big Question where I answer questions from the community about Data Engineering questions.

Why Data Engineers Should Care About IoT

June 4, 2018 by Thomas Henson Leave a Comment

Why Data Engineers Should Care About IoT

The Internet of Things has been around for a few years but has hit an all time high for buzzword status. Is IoT important for Data Engineers and Machine Learning Engineers to understand? By 2020 Gartner predicts there to be over 21 Billion connected devices world wide. The data from these devices will be included in current and emerging big data work flows. Data Engineers & Machine Learning Engineers will need to understand how to quickly process this data data merge with existing data sources. Learn why Data Engineers should care about IoT in this episode of Big Data Big Questions.

Transcript

Hi folks, Thomas Henson here, with thomashenson.com. Today is another episode of Big Data Big Questions. Today, I want to tackle IoT for data engineers. I’m going to explain why IoT, or the Internet of Things, matters for data engineers, and how it’s going to affect our careers, how it’s going to affect our day-to-day jobs, and honestly, just the data that we’re going to manage. Find out more, right after this.
[Sound effects]

Today’s question is, what is IoT, and how does that affect the data engineers? We’ve probably seen the buzz word, or the concept of the Internet of Things, but what does that really mean? Is it just these little dash buttons that we have? Is this? Wait a minute. Is that ordering something?

Is this what IoT is, or is it the whole ecosystem and concept around it? First things first. IoT, or the Internet of Things, is the concept of all these connected devices, right? It’s not something that is, I will say, brand new. Something that’s been out there for a while, and when we really think about it, getting down to it, it is a sensor. We have these sensors, these cheap sensors.

We’ve had them for a long time, but what we haven’t had is all these devices connected with an IP address to the Internet, that can send the data. That’s the big part of the concept. It’s not just about the sensor, it’s about being able to move the data from the sensor.

This gives us the ability to be able to manage things in the physical world, bring them back, do some analytics on it, and even push data back out to it. The cool thing is, generally with IoT devices these are, I would say, economical or cheap devices that have an IP address, that can just pull in information. Think about a sensor, if you have a smart watch that’s connected to the Internet and can feed up information to you. That’s where some of it all started. These dash buttons. I can have these dash buttons all installed around my house, push a button whenever I need something, or start to look at what we’re talking about with smart refrigerators. Smart refrigerators can take pictures and have images of what all’s, the content that’s in your refrigerator, so if you’re at the store, you look, and you’re like, “Hey, you know, what am I…? Do I need that ranch dressing? Yeah? Let me check in my refrigerator, here.”

Also, a sensor could be inside the refrigerator, and tell you if something’s going wrong. Maybe the ice maker is blocked. Maybe you need a new water filter in your refrigerator, and the refrigerator knows that, has a sensor into it. It can send information to wherever, to be able to order that water filter for you and send it to your home, so you don’t even have to go in, and remember, “Hey, has it been 90 days? Or was it 60 days? Is it time? Is it time to change it?” Then, you’re going to forget. You’re going to let it go over, but now, you can have this sensor that’s going to tell you, and it’s going to order that for you. That’s the concept. It’s not just about the sensor. It’s about that ecosystem.

It’s about being able to move the data. For data engineers, what does this mean? Why do we care?

There are a lot of predictions out there about IoT and where it’s going. One of the big ones is, Gardner has a prediction that by 2020 we will have 20 billion, over 20 billion, of these devices. Not just the dash buttons, but just think of all these sensors, all these things with IP addresses connected to the Internet. What does that mean, from a data perspective? Some numbers that I’ve seen are 44 zettabytes of data are some of the predictions that I’ve seen, that’s going to be contributed to new data that’s coming in and the data that we have that’s already existing. Think about it. What is a zettabyte? It’s not a petabyte. It’s bigger than a petabyte.

How are we going to manage all these data, when right now we’re still managing terabytes and petabytes of data, and being like, “Man! This is a lot of data!” That’s why it’s important for data engineers, is that’s contributing to this deluge of data. How does all that affect us, as far as what are some of the concepts? When we start talking about IoT, and sensors, and having these data on the edge, being able to pull information back, but also being able to push the information out. What does that start to say?

As we’ve talked more and more about real-time analytics, this is where we’re really going to start to see real-time analytics really taking hold. As soon as we can get that data, and be able to analyze it and push information back out, that’s what’s going to help us. Think about it with automated cars, with a lot of the things that are going on outside in the physical world, where we have sensors, and devices talking to devices, streaming analytics is going to be huge in IoT.

The question becomes, if you’re looking to get involved in IoT, what are some of the projects? What are some of the things you can do to contribute and be a part of this IoT revolution? I would look into some of the messaging queues. Look at Pravega, look at Kafka, even look at RabbitMQ, and some of the other messaging queues, because think about it. As 20 billion devices, maybe more, by 2020. As these devices come in, they have to come into a queue. They have to be stored somewhere before they can be processed and before we can analyze them. I would look into the storage aspect of that.

Also, know how to do the processing. Look at some of your streaming processing, whether it be Apache Beam, whether it be Flink, or whether it be Spark. I would look into those, if you’re looking to get involved in IoT. If you have any questions, make sure you submit those in the comments section here below, or go to thomashenson.com/big-questions. Submit your questions, and I’ll try to answer them on here.

Until next time, see you again.