Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Kappa Architecture Examples in Real-Time Processing

October 11, 2017 by Thomas Henson Leave a Comment

Kappa Architecture

“Is it possible to build a prediction model based on real-time processing data frameworks such as the Kappa Architecture?”

Yes we can build models based on the real-time processing and in fact there are some you use every day….

In today’s episode of Big Data Big Questions, we will explore some real-world Kappa Architecture examples. Watch out this video and find out!

Video

Transcription

Hi, folks. Thomas Henson here with thomashenson.com. And today we’re going to have another episode of Big Data, Big Questions. And so, today’s episode, we’re going to focus on some examples of the Kappa Architecture. And so, stay tuned to find out more.

So, today’s question comes in from a user on YouTube, Yaso1977 . They’ve asked: “Is it possible to build a prediction model based on real-time processing data frameworks such as the Kappa Architecture?”

And so, I think this user is stemming this question from their defense for either their master’s degree or their Ph.D. So, first off, Yaso1977, congratulations on standing on your defense and creating your research project around this. I’m going to answer this question as best I could and put myself in your situation where if I was starting out and had to come up with a research project to be able to stand for either my Master’s or my Ph.D. What would I do, and what are some of the things I would look at?

And so, I’m going to base most of these around the Kappa Architecture because that is the future, right, of streaming analytics and IoT. And it’s kind of where we’re starting to see the industry trend. And so, some of those examples that we’re going to be looking for are not just going to be out there just yet, right? We still have a lot of applications and a lot of users that are on Lambda. And Kappa is still a little bit more on the cutting edge.

So, there are three main areas that I would look for to find those examples. The first one is going to be in IoT. So your newer IoTs to the Internet of things workflows, you’re going to start to see that. One of the reasons that we’re going to see that is because there’s millions and millions of these devices that are out there.

And so, you can think of any device, you know, whether be it from a manufacturer that has sensors on manufacturing equipment, smart cards, or even smartphones, and just information from multiple millions of users that are all streaming back in and doing some kind of prediction modeling doing some kind of analytics on that data as it comes in.

And so, on those newer workflows, you’re probably going to start to see the Kappa Architecture being implemented in there. So, I would focus first off looking at IoT workflows.

Second, this is the tried and true one that we’ve seen all throughout Big Data since we’ve started implementing Hadoop, but fraud detection, specifically with credit cards and some of the other pieces. So, you know, look at that from a security perspective, and so a lot of security. I mean, we just had the Equifax data breach and so many other ones.

So, I would, for sure, look at some of the fraud detection around, you know, maybe, some of the major credit card companies and see kind of what they’re doing and what they have published around it. Because just like in our IoT example, we’re talking millions and millions, maybe, even billions of users all having, you know, multiple transactions going on at one time. All that data needs to be processed and needs to be logged, and, you know, we’re looking for fraud detection. That needs to be pretty quick, right? Because you need to be able to capture that in the moment that, you know…Whether you’re inserting your chip card or whether you’re swiping your card, you need to know whether that’s about to happen, right?

So, it has to be done pretty quickly. And so, it’s definitely a streaming architecture. My bet is there’s some people out there that are already using that Kappa Architecture.

And then another one is going to be anomaly detection. I’m going to break that into two different ones. So, anomaly detection ones talk about security from the insider threats. So, think of being able to capture, you know, insider threats in your organization that are maybe trying to leak data or trying to give access to people that don’t need to have it. Those are still things that happen in real-time. And, you know, the faster that you can make that decision, the faster that you could predict that somebody is an insider threat, or that they’re doing something malicious on your network, the quicker and the less damage that is going to be done to your environment.

And then, also, anomaly detection from manufacturers. So, we’re talking about a little bit about IoT but also looking at manufacture. So, there’s a great example. And I would say that, you know, for your research, one of the books that you would want to look into is the Introduction to the Apache Flink. There’s an example in there from a manufacturer of Erickson who’ve implemented the Kappa Architecture. And what they have is…I think it’s like 10 to 100 terabytes of data that they’re processing at one time. And they’re looking for anomaly detection in that workflow to see, you know, are there sensors? Are there certain things that are happening that are out of the norm so that maybe they can stop manufacturing defect or predict something that’s going to go wrong within their manufacturing area, and then also, you know, externally, you know, from when the users have their devices and be able to predict those too?

So, those are the three areas that I would check, definitely check out the Introduction to Apache Flink, a lot of talk about the Kappa Architecture. Use that as some of your resources and be able to, you know, pull out some of those examples.

But remember, those three areas that I would really key on and look at are IoT, fraud detection. So, look at some of the credit companies or other fraud detections. And then also, anomaly detection, whether be insider threats or manufacturers.

So, that’s the end of today’s episode for Big Data, Big Questions. I want to thank everyone for watching. And before you leave, make sure that you subscribe. So, you never want to miss an episode. You never want to miss any of my Big Data Tips. So, make sure you subscribe, and I will see you next time. Thank you

Filed Under: Big Data Tagged With: Big Data, Big Data Big Questions, IoT, Kappa

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2025 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in