How does GDPR Impact Data Engineers
The General Data Protection Regulation (GDPR) goes into effect in May 2018. Many organizations are scrambling to understand how to implement these regulations. In this video we will be discussing Big Data Impact of GDPR.
Transcript – Big Data Impact of GDPR
Hi, folks! Thomas Henson here, with thomashenson.com, and today is another episode of Big Data Big Questions. Today is a very special episode. We’re going to talk a little bit more about regulation than we’ve probably talked about before.
We’re going to tackle the GDPR and what that means for big data, big data analytics, and why data engineers and even data scientists should understand the regulation and know it at least from a high level. Find out more right after this.
[Sound effects]
Welcome back. Today is a special episode. We’re going to talk about the GDPR, which is the general data protection regulation, and we’re going to talk about what that means for a data engineer, and why you should understand that.
Just to have a high-level overview, this is going to be one of those things where understanding this regulation is really going to help you. You’re going to have meetings about it. This is such a big change for our industry. If we think about it from an IT perspective or a big data perspective, think of changes that have happened in other industries.
Think of what happened in the US with the SEC in accounting, around Enron and some of the other financial accounting problems that happened in the early 2000s, and then also think about healthcare. Healthcare regulation is, if you know anything about healthcare, you probably know at least the HIPAA requirement. This GDPR is going to be similar to that. Nowadays, taking place in the EU, but the ramifications are going to happen, I believe, everywhere, because one, data exists everywhere. Most companies are global companies, and the way that we handle and capture that data, whether it be from a user in the EU or a user anywhere else in the world, we’re going to have to have those regulations, and have those systems in place, so that we can comply to that.
Just from a high level, if you’re a data engineer, and we focus on the technology, and the hardware, but from non-technical careers, remember we’ve talked about this before, so some of these non-technical careers. We talk about data governance in other places. If you’re interested in that, get head first, dive into the general data protection regulation. Find out as much as you can, because that’s really going to make yourself, one, valuable in the meetings, but also if you’re looking to do a career change, maybe you’re already doing some kind of compliance or something like that, and you want to get involved in big data, here’s your opportunity. Become an expert at this, because we’re moving fast to have to comply.
Just to talk a little bit about it. It’s the EU agreement on how data is processed and stored. It’s a replacement for the data protection directive 95/46, so this is a more stringent, more all-encompassing. You’re probably like, “Why are we going down this route? Why is a regulation coming out?” If you think about it, a lot of things have been happening over the last few years.
How often do we hear about a data breach? There was a huge one last year, right? Affecting millions and millions of users, people’s credit card, people’s social security numbers. Our data is constantly under attack, and it’s, from a big data perspective, we hold onto data so that we can analyze it and make better products, make more efficient products, make better websites, better clickthrough rates on your ads, there’s so many different things that we do with these data, but also, there’s so much danger in having it.
We have to make sure that we’re protecting it, and then also, we want to make sure from a privacy perspective, and this is where this is really going to hit, is allowing users to opt in or opt out. Knowing what’s being collected and how long they’re going to have it, and then also giving you the ability to say, “You know what? Let’s get rid of that data.” I don’t want you to hold onto it.
Those are some of the things that you’re going to be tackling with it. Also, just as a note, it was approved on April 14th, 2018. Must be complied with by May 25th. We’ve got some time, here, between it, and that’s where I’m really encouraging people, even if you’re watching this video after the date, you’re wanting to get in big data, on the governance side, maybe you have non-technical career options, learn this. I’m serious. Just learn this. This is going to be huge. You’re seeing, if you follow anything from Hortonworks, or Cloudera, or anybody involved in big data or even IT, you’re getting bombarded with information about it, because it’s such a big deal, and then the compliance on this, like I said, it’s industry-shifting, just like HIPAA was, and just like some of the SEC regulations and accounting regulations that came out in the early 2000s. If you’re looking for, I’ve got the official site listed here, so you can see where to go from the EU and see it.
Like I said, you’re going to see a ton of blog posts. There’s a ton of resources out there. Some of the tools, if you’re on the technical side, and you’re wondering, okay, I’ve got to go into a meeting. If somebody’s going to ask me what we’re doing about some of the data governance, and some of the other pieces, where can I focus, or where can I say, “Hey, you know what? Give me a week or two. Let me look at some of the things maybe you weren’t doing, and maybe the way that you’re protecting the data is a little bit different.”
Maybe the way that you’re tracking and holding onto the data, so that you can comply by getting rid of users’ data or opting not to track it, or even using a way to mask it, right? Using a way so that you can mask it, so you’re protecting the identities a little bit better. Maybe those are some of the weak points that you are…
Look into Apache Atlas, Apache Rancher, and Cloudera Navigator. Depending on the flavor of the Hadoop framework you’re using, or Hadoop package you’re using, whether it be Hortonworks or Cloudera, if you’re using one of those two main ones, look into these two tools, these three tools right here. This will give you some kind of framework, so you’re starting to see. So, you walk into the meeting, somebody says, “Hey, we’ve got to look at how we’re complying with GDPR, we want to really focus on data governance. What are we doing?” You’re sitting there saying, “I don’t know how to tackle this,” if you’re not doing it.
Go. Know these tools. Understand them from a high level. If you need to implement them, it’s a whole different story, but you can start getting trained up, start implementing those, too. Hope this was very helpful. This is something that I’m sure we will make some more videos on. We’ll be talking about constantly. I predict that this is kind of, like I said, industry-shifting regulation for IT and especially for big data, for all of us. I’m sure there’s going to be follow-on. I’m sure other countries in other areas, they’re starting to look at regulations. I’m sure here in the US, I’m sure Russia, Japan, I’m sure everywhere, they’re starting to look at some of these regulations. It’s not going to be just for the EU. Even if it was, it’s still affecting us. Everything’s global. If you have any questions, make sure you put them in the comments section here below. I will answer them here on Big Data Big Questions. You can go to my website, thomashenson.com. Look for the Big Questions, send me a comment. Also, make sure that you’re subscribing, so that you never miss an episode, and I will see you next time.













