Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Archives for January 2018

Talking Heron Real-Time Analytics with Streamlio

January 31, 2018 by Thomas Henson Leave a Comment

Heron Real-Time Analytics
To say Streaming Analytics is popular is an understatement. Right now Streaming Engineering is a top skill Data Engineers must understand. There are a lot of options and development stacks when it comes to analyzing data in a streaming architecture. Today I sat down with Lewis Kaneshiro (CEO & Co-founder) and Karthik Ramasamy (Co-founder) of Streamlio to get their thoughts on Streaming Analytics and Data Engineering careers.

Streamlio Opensource Stack

Streamlio is a full stack streaming solution that handles the messaging, processing, and stream storage in real-time applications. The Streamlio development stack is built primary from Heron, Pulsar, and BookKeeper. Let’s dicuss each of these opensource projects.

Heron

Heron is real-time processing engine used/incubated by Twitter. Currently Heron is going through the transition of moving into the Apache software foundation (learn more about this in the interview). Heron is at the heart of real-time analytics by processing data before the time value expires.

Pulsar

Pulsar is an Apache incubated project for distributed publishing and subscribing messaging real-time architectures. The origin of Pulsar is similar to that of many opensource big data projects in that it was used first by Yahoo.

BookKeeper

BookKeeper the scalabale, fault-tolerant, and low-latency storage service used in many development stacks. BookKeeper is under the Apache Software foundation and popular in many opensource streaming architectures.

Interview Questions

  1. Have we as a community accepted Hadoop related tools to be virtualized or containerized? 
  2. How do Data Engineers get started with Streamlio?
  3. What are the biggest real-time Analytics use cases?
  4. Is the Internet Of Things (IoT) the primary driver behind the explosion in Streaming Analytics?
  5. What skills should new Data Engineer focus on to be amazing Data Engineers?

Checkout the interview to learn answers to these and more questions.

Video Streamlio Interview

Links from Streamlio Interview

Streamlio

Heron

Apache Pulsar

Apache BookKeeper

Apache Storm

 

Filed Under: Streaming Analytics Tagged With: Data Engineers, Streaming Analytics

How to Change the Theme in Cygwin

January 17, 2018 by Thomas Henson Leave a Comment