Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Do Data Scientist Code?

June 6, 2019 by Thomas Henson Leave a Comment

Scientist Who Code

Data Science is a hot career field in Data Analytics. On data teams with Data Engineers how much coding is expected from the Data Scientist?

The role of the Data Scientist includes find features or correlations in data that might predict outcomes. Those prediction then become data models that are tested multiple times. After those data models reach a high confidence level they are then automated in applications. In this episode of Big Data Big Questions let’s find out just how much coding a Data Scientist can be expected to do.

https://youtu.be/0bg1SAgKiU8

Transcript – Do Data Scientist Code?

Hi folks. Thomas Henson here with thomashenson.com. Today is another episode of Big Data Big Questions. Today, I’m still here in the gym answering your questions, recording between mail trucks coming by. Crazy.

Today’s question comes in from a viewer. Thanks for watching. If you have any questions, make sure you put them in the comments section here below, and also subscribe, make sure you hit that bell so you get notifications. Today’s question comes in around data scientists, and it’s specifically, “Do data scientists code?” We talked about the roles of a data engineer. We’ve talked about the roles of a machine learning engineer and even data scientist, but where is that fine line between how much data scientists code? We’re going to talk about that, and talk about some of the tools that they use, and then try not to day, “Depends.”

I know, I know. You’re like, “Man. He’s going to say, ‘depends.'” But, I’m going to try not to say that. Does a data scientist code? The answer is yes. Data scientists, for the most part, they’re able to code. The tools that they use, how much are they coding, that’s really going to be dependent on — didn’t say depends, dependent — the role that they’re in. If they have a data engineer or a machine learning engineer, that can help them put their code in production and finalize some of the things that they’re doing.

I will say, I’ve worked on a team before where, from a data scientist perspective, they were primarily using MATLAB. They were using MATLAB or Excel, and then when we pushed it to Hadoop, we’ve talked. You’ve heard me talk about it many times on here, that little bit harder, because once we’re doing that in a solo environment on their machine, and then we went to distributed algorithms, at the time we were using Mahout [Phonetic 00:01:45]. Going from Excel or MATLAB to Mahout, we really had to change and tune a lot of different things. That’s where my expertise as a data engineer and developer was able to help out and keep running the same job a hundred times.

Things have changed with TensorFlow and a lot of other tools since that time. Yes, the data scientists will go. It’s going to depend on what they’re going to use. Some of the common tools, like I said, MATLAB, even Excel, dependent on what you’re working on. If it’s a big data project, might not be using some of the other larger tools. Then, you also have R. You have Python, which we’ve talked about and we love here. We also have Scala or skuh-la. I always say it wrong. There are many different tools that data scientists use whenever they’re coding. That doesn’t mean that they’re doing all the coding. This is back to the dividing lines of the roles and where they are. Whenever we’re talking about standing up the environment, pushing things out to production, and even doing some of the heavier lifting, like I will say operationalizing of the code. Getting it ready, past the trendy phase on some of the other pieces that you’re really bringing it in to, “Hey, it’s going to support this dashboard.” It’s going to do this piece, or even some of the ETL jobs, still going to come to us as the machine learning engineers or the data engineers, just depending on the role. Still not saying depends. It’s depending on that role. For the most part, to answer the question simply, yes. Your data scientist is going to code. How much of that code is really in-depth, if you think about, are they doing the job of coding? I would say no. That would just be what I’ve seen. If you think about some of the reasons that we have the tools that we have now, like why was PIG created? Why do we have a Python API in Spark, when Spark, we could do Spark in Java, right?

Why do we have a Scala API in Spark? It’s because of the fact that we want to use a higher-level language, so that data analysts, data scientists, they can run their code, and they can do it without having to worry about Java and some of the other components there. Yes, data scientists code. How much do they code? It’s going to depend on their partner machine learning or data engineer. Your data scientist is not going to replace the data engineer or machine learning engineer. We’re all on the same team, here. We’re not trying to compete back and forth, and if I had to choose a side, I’d say machine learning engineers and data engineers. But, I’m very biased.

That’s all for today. Hope you enjoyed this episode. If you have any questions, throw them in the comments section here below, and make sure you hit subscribe, and ring that bell, and I will see you again on the next episode of Big Data Big Questions.

Filed Under: Data Science Tagged With: Code, Data, Data Science

Tableau For Data Science?

May 15, 2019 by Thomas Henson Leave a Comment

Big Data Big Questions

Tableau is huge for interacting with data and empower users to find insight in their data. So does this mean Tableau is the primary tool for Data Scientist? In this episode of Big Data Big Questions we tackle the question of “Is Tableau used for Data Science”.

Tableau For Data Science

What is Tableau

Tableau is a business intelligence software that allows for users to visualize and drill down into data. Data Users leverage Tableau highly for visualization portion of Data Science projects. The sources for data can be from databases, CSVs, or almost any source with structured data. So if Tableau is for analyzing and visualizing data is it a tool specific Data Scientist? Watch the video below to find out Tableau’s role in the world of Data Science.

Transcript – Tableau For Data Science?

Hi folks! Thomas Henson here with thomashenson.com, and today is another episode of Big Data Big Questions. Today’s question comes in from a user, and it’s around data science, and Tableau, and how those go together. But, before we jump into the question, if you have a question that you want to know about data engineering, IT, data science, anything related to IT, or just want to throw a question at me, put it in the comments section here below or reach out to me on Twitter at #BigDataBigQuestions. Or, thomashenson.com/big-questions. Ton of ways to get your questions here, answered right on this show, all you have to do is type away and ask.

Now, let’s jump into today’s question. Today’s question comes in from a YouTube viewer, and it’s about, hey, in data science, do you use Tableau? You can see the question here as it pertains to this, and so this is a question we started up this show doing, around data engineering, but now we’re really jumping towards, hey, what’s going on from a data science and just encompassing all of it? Today’s question, we’re going to talk about where’s Tableau used, right? A lot of people use Tableau. It’s really, really popular. But, is that really a tool that a data scientist is going to use? Should you invest your time as a data engineer or a data scientist aspiring or not aspiring to get into data science? Should you spend time learning about that tool?

My thoughts on Tableau are that it’s really good for giving information out to users that could be not necessarily data scientists. They could be users of it. They could be analysts. They could be somebody who just has a stake in their business. I’ve used it at a lot of different corporations that I’ve worked at, and companies, and companies, and organizations, and really what I see is those tools are more for the end user, for visualization. They may fall more in the data visualization bucket. We’ve talked about the three tiers of work. You have your data scientist, you have your data engineer, and your data visualization specialist, the person who’s making sure that, hey, at the end of the day, it’s great that we have all these algorithms that are showing us and being able to predict whatever we’re trying to look at in our data, but if we can’t sell that and can’t convey that to the people that need the data to make a decision on, then it’s just an experiment, it’s just us having fun doing research.

When it comes to an end product or being able to really sell your point, data visualization, I think that’s the bucket that Tableau fits in more than just traditional data science. Could be wrong. Let me know if I am here in the comments section below, but let me talk a little bit about my use case and where I’ve seen it. Like I said, I’ve used it in a lot of different organizations that I’ve worked with or even contracted with. One of the main use cases, I’ll give you an example. Let’s say that you’re a YouTube viewer. I’m not saying YouTube uses Tableau, this is just an example. I don’t want to give away too much information, insider. If you have a YouTube channel, think about if you want to see the videos that are coming in. You’re a user. You’re a publisher, a creator. You want to know. Here is all the videos that you have. Here’s how long they’re watched. Here’s all the demographics from behind the scenes that you can pull. Maybe the times that they were watched. How long they were watched, so on this video here, if people drop out after 30 seconds, I did something wrong there. Versus, how many people go through the end of it. Same thing, too. What you would do is, you would have all this information and aggregate all this data, and you maybe even pull some insights. Like, hey, what’s your average? We can do some real simple things, or you can do some complex things, too. Tableau is where you’re going to give the end user the access.

At least what I’ve seen a lot. There’s a big need to be able to do that and be able to pull that data. It gives you a way to, I wouldn’t say that a data scientist wouldn’t, per se, use that as their tool. It wouldn’t be their only tool. Maybe that’s the way that they aggregate and look at large amounts of data before they go in and start to pick and choose. I’m sure there’s some modules out there that are incorporating machine learning and deep learning. I will say, if you’re really looking from an AI perspective to jump into, it’s not just going to be about Tableau. I’m not saying that you shouldn’t get up to speed on Tableau, but I wouldn’t say that, hey, I’m a brand-new person graduating high school, graduating college, or somebody that sees it in their career and looking to go into data science, my choice would not be to jump in and learn Tableau. I would start learning a little bit more about Python, and algorithms, and maybe R, or some of the other higher-level languages to talk around machine learning and deep learning, versus saying, “Hey, this is the tool that’s going to kind of take me there.” Now, if you’re a data visualization person, or you want to get into big data from that perspective, there’s a lot of things that you can use Tableau to do. You might add it to your bucket. As far as we talk about on this show, how to accelerate your career or how to break into the big data realm, this is not one of those tools that I’m going to say, hey, this is the only choice you have. Not really going to be the one that’s probably going to make the more sense. It’s not going to be the game changer, like hey, this person’s certified in Tableau or is a Tableau wizard. If you’re applying for a job that’s all around Tableau then, definitely. As far as, I really want to get down into data science, and I really want to get deep in it, Tableau’s one of those things. Definitely probably going to use or come across tools that are similar to that, but it’s not going to be your mainstay, probably, where you’re writing your algorithms and doing your analytics.

That’s all for today. If you have any questions, make sure you put them in the comments section here below, and then make sure you click subscribe to follow this channel, so that you never miss an episode of Big Data Big Questions.

[Music]

Filed Under: Data Engineers, Video Tagged With: Big Data Big Questions, Data, Data Science

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2025 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in