August 2018 - Thomas Henson

Is Splunk a tool Data Engineers should take time to learn?

Throughout my careers I’ve always been on the look out for new tools that can help me create more value in my role. Splunk has been one of those tools. A few years back a Cory Minton suggested I look into as the easy button for Machine Data. I was so excited about how well Splunk for analyzing Machine Data I publish a Pluralsight course Analyzing Machine Data with Splunk.

Splunk is a platform that allows for Data Engineers and System Administrators quickly analyze machine data or semi-structured data quickly. It’s as simple as pointing to log files and start building visualizations with a Web GUI. If you can operate Excel then you can quickly build powerful visualizations. Find out my thoughts about where Splunk fits in the Data Engineers skill set in the video below.

Transcript

Hi, folks! Thomas Henson here with ThomasHenson.com. And today is another episode of Big Data Big Questions! And so, today, I’m going to be tackling a question, or maybe a concept, of Splunk. Specifically, we’re going to tackle: where does Splunk fit in for data engineers, machine running engineers, and data scientists? Is this an application they should learn? Is this something that can kind of further their career or help them out with projects? Or is Splunk more for the operation side and not something that we want to focus on from a data engineering prospective?

So, we’re going to tackle that. I’m going to tell you a little bit about Splunk, so I can give you some information on it and places where you can find training on it. But then, also, I’m going to answer at the end, “Hey, is this something that I need to be putting in my tool bag or not?” So, find out more right after this.

Welcome back! So, today is Big Data Big Question. Before we jump into it, I want to make sure that you’re subscribed to the YouTube channel, so that you never miss an episode around data engineering, machine learning, data science, IT concepts. Just make sure you subscribe and hit the notification button.

Also, if you have any other questions… So, say I’m going through this question today on Splunk and offering some information around that. If, maybe, I don’t hit that topic correctly, or you have further questions and I wasn’t really clear, or maybe there’s different things that I just didn’t bring up because of time, put those in the comments section here below. I’ll read them on-air and I’ll answer the questions best I can. Or you can reach out to me, find me on Twitter. You can find me on my website, Big Data Big Questions, and you can submit those questions; I’ll answer them on-air the best I can.

So, today’s question comes in around Splunk. So, specifically, it’s: is Splunk for data engineers? Right? Should a data engineer be learning Splunk? Is that a tool or a tool set that you should have in your tool bag? Or is it something that—if you’re really trying to break into the big data, data engineering roles—is that something you should really be looking at?

So, before we jump into answering that question, I did want to talk a little bit about Splunk, if you’re not familiar with Splunk. So, Splunk’s been around; it was founded in 2003, but it’s really been growing crazy in the last, I’d say, five years or so. So, I found out about Splunk about three or four, maybe five years ago. We were working on a couple different projects. Actually, these were data analytics projects, so this is actually a valid question.

Right now, Splunk, they sell… They’re not an open-source company. So, we’ve talked about Hortonworks and Cloudera, and some of their models and the way they go to business. Splunk is a… They charge per terabyte for their software. So, it always for you to ingest machine data—or any kind of data, actually—into your system.

So, it’s a billion dollar-plus company. So, like I said, it’s been growing a lot here lately. They have user conferences, they’re kind of known around the world. They really started off around analyzing machine data. And so, that was kind of their first entry. And a lot of people started using it because, hey, it gave you the ability to… Say you had some infrastructure and some servers that you wanted to pull some data off of, or anything kind of in your data center that you wanted to pull data off. You could set up these Splunk forwarders and it would pull the data in, and it was kind of just seen as that easy button.

So, we’ve talk about when you set up your Hadoop cluster. And then, being able to ingest your data, and then how do you set up your visualization. There are a lot of different steps that you’re doing there. Well, with Splunk, it was all in one consolidated package. Right? You’re able to take some log files, for example, and you want to be able to bring those log files in and start setting them up like pivot table. So, think of…

It’s a lot more complex than this right now, but when you first started out, it was kind of like, hey, you were able to pull your data in and you could start setting up tables just like you would in Excel. So, that was really, really beneficial, and that kind of helped with the stickiness of the problem. Right? So, think about being able to set up your cluster in an hour-and-a-half, and be able to start pulling tables and doing things like that.

That’s just not something you can do with Hadoop and the Hadoop ecosystem. So, Splunk was kind of that easy button for log files, and it’s kind of evolved over time. So much so… A couple years ago, I did a course around analyzing machine data with Splunk. And so, Splunk was really interesting to me. Like I said, I was working on a couple different projects for using Splunk and using some of the Splunk competitors, like Elasticsearch and Yelp Stack.

And so, I really find that Splunk is an easy product to kind of learn, and it’s a really fun product, too. There are also a lot of [INAUDIBLE 00:04:36.13], so you can actually move data from your Splunk environment into Hadoop. And so, there are some things that you can do there. The tool for doing that is called Hunk. And so, if you’re interested in learning about Splunk and your Pluralsight, remember, just go ahead and check out my course called Analyzing Machine Data with Splunk.

But Splunk is kind of… I said it started out with machine generated data, but it’s kind of blossomed outside of that. There’s been a real big movement around solving data analytics problems. Because, like I said, Splunk was the easy button and gives you the ability to, hey, you can set it up. And I would say even less than an hour. Right? Depending on your technical background, you could probably set it up less than an hour and start analyzing your data there.

And so, they’ve really made a hold in DevOps. Also, around machine learning. So, I said that they do the machine data, but they have algorithms that you could start using. So, you can actually pull somebody’s machine learning algorithms and start testing out your data sets, and say, “Hey, this would be a good candidate to be able to set up some kind of automation.” Right? Or set up a way to predict what’s going on in your network, what’s going on pretty much any of the data that you’re doing. So, it’s not just the machine data anymore.

Also, an IoT. So, just with… We’ve talked a ton here about IoT and some of the complexity there with little sets of data that’s pushing out. And think about having multiple…millions of devices that need to come back in. Well, Splunk is able to do that. And with the built-in analytics tools for it, it’s a really prime example for a tool to use.

So, I’ve talked a little bit about Splunk and my course of Pluralsight, and where you can get more training. But the question was: is Splunk something a data engineer should learn? I think you should learn Splunk or should be familiar with it. Especially if you’re on the data engineering side.

Now, how much should you learn if you’re a Hadoop administrator or you’re a Hadoop engineer, and you’re kind of already set in that area? I wouldn’t spend a lot of time unless you have a project that is coming up.

But if you’re new and you’re budding, and you really lack the operational or administrative side of data engineering—we’ve talked about the different roles there—I think it’s definitely a tool that you should look into because there are a lot of options out there for career options. So, if you have a little bit of Hadoop knowledge already and you’ve kind of gone through that, there’s a big demand for those kinds of jobs. But there’s also a huge demand for Splunk jobs, too.

So, if it’s something that you’re looking to get into and you like the data analytics aspect of it, then definitely going through and learning Splunk… And, like I said, it’s easy to get set up and get started with it, so go through and set up some of those. Like I said, look at all the job’s specifications. So, if you’re looking to do this specifically for some kind of job, then… If there’s a requirement or if you see something there, then definitely go in and learn Splunk.

Also, should you use it in some of your projects? Well, it’s going to depend on what projects. I would say if you’re doing anything that’s machine generated and you’re wanting to stand something up, and you don’t mind… You know, you guys already have a license for Splunk or you guys are looking to purchase a license for Splunk, then there are some definite benefits to using that technology in your project.

So, it’s all going to depend on what you’re working at. But just know the ins-and-outs. And, like I said, that’s why it’s important to kind of get exposure to this kind of information and this kind of technology, just so you know when to use them on certain projects.

So, that’s my Big Data Big Questions for today. Make sure you subscribe so that you never miss an episode. Like I was just talking about, you need to be exposed to a lot of these tools, so you know how to use them in projects. Or, if you’re trying to make a change in your career or further your career, these are definite educational things that you should be doing.

So, until next time. See you, folks.

Archives for August 2018

Splunk For Data Engineers

Is Splunk a tool Data Engineers should take time to learn?

Transcript