Thomas Henson

  • Data Engineering Courses
    • Installing and Configuring Splunk
    • Implementing Neural Networks with TFLearn
    • Hortonworks Getting Started
    • Analyzing Machine Data with Splunk
    • Pig Latin Getting Started Course
    • HDFS Getting Started Course
    • Enterprise Skills in Hortonworks Data Platform
  • Pig Eval Series
  • About
  • Big Data Big Questions

Top 4 Places to Find Big Data

December 9, 2016 by Thomas Henson Leave a Comment

 

Top 4 Places to Find Big Data

Finding data data for testing in your own Hadoop projects doesn’t have to be hard!

There are many place to find free data sets for running in your development environments. Checkout this video to find out my Top 4 places to find Big Data. Spoiler alert you can also find small data in these places….

YouTube Video

—

Transcript

Hi and welcome back to Thomas Hanson com have you ever been working in your big data environment thought we could have one more data to test it be great if I could have more data synthetic eye test out this new open source tool or just maybe this new function that you want to run today I’m going to talk about my four favorite places to find big data number four on the list is Yahoo actually the yahoo finance section you can actually go in here and look up your favorite stock or even your favorite mutual fund and find historic information and so what I like to do is I like to come in here and get historic information that will give you daily values on the stop you can take that data and inserted into HDFS or a database for however you want there’s a lot of different options and this data actually export to csv it’s really accurate data but it is limited in the set because you’re only looking at stock values but if you need a quick fix to get some data this is where I come to first coming in at number 3 is actually some weather data from Noah this data is very accurate but one of the drawbacks to getting the data and the reason it’s only number three on the list you have to actually open an account and request hey in this geographic information i would like to compile the weather data from here and so if you’re looking for accurate data this is a very good site that i would use but if you’re looking for something quick this is not going to be something that you want to use typically you’ll receive the data in less than 24 hours but just know that it could be a lot longer and that’s why weather data is number three on the list coming in at number two and a really close favorite to number one is tableaus public website and their sample data sets and this is relatively new to me but they have a lot of different information sets and a lot of different categories so like government lifestyle health and then one of my favorites that sports the format’s come back in Excel or CSV format so it’s really easy another cool thing is you don’t have to login so you can just come in download these datasets upload them into HDFS and start playing away and so that’s why this is number two on my list tableaus public data sets and now for number one on my list of your favorite places to find data is Kaggle’s website and cable start off on the scene is just a contest side for data scientist or amateur data scientists to be able to test out and solve problems one of the famous examples was Netflix there was a contest out there to see if you could be Netflix data scientist in how to recommend better videos for people and so it’s really cool I think they gave out like a million dollars for the contest but now this website is more than just a contest site it actually has data sets and its really a one-stop-shop for data scientist so it’s one of those websites you want to come in and you want to check for me i really like the datasets ight now you do have to login to be able to access the data but you have a vast amounts of data sets and if I were stuck on an island i can only have one of these it would be the Kegel website because they’re always updating a lot of different datasets they have something small and something large and so you can see here you can go through in search and you can see the latest data that’s been updated you can search by different features and like I said it’s community-driven so there’s always new data sets available this is why it’s number one on my list and so just a recap remember for top favorite places number four was Yahoo’s finance section number three was the weather data and Noah number two and a close favorite was tableaus public website where they have the sample data sets and the number one the best place was Kaggle datasets thanks for tuning in and be sure to

Filed Under: Big Data Tagged With: Big Data, Data Analytics, Database

Reverse Engineering with MySQL Workbench

October 1, 2012 by Thomas Henson 1 Comment

[bra_highlight style=”highlight2″]In this Post we will learn  about reverse engineering with MySQL Workbench on top of an existing MySQL database schema. This is the second part of an ongoing series for using MySQL Workbench to manage databases. Part 1 Setting Up MySQL Workbench [/bra_highlight]

mysql book link

What is database schema

Have you wanted look to see how tables in a database relate to each other or had to update a project with an existing database and no documentation? Well in this second part of our tutorials on MySQL Workbench we are going to create documentation from an existing WordPress database. This documentation will be called an EER (Enhanced Entity Relationship) Diagram but we can just refer to it as our database schema. MySQL Workbench has a wizard that will help us get started with creating this documentation.

Begin Modeling

  1. First we will open up our MySQL Workbench and click on the Create EER Model From Existing Databaseopen eer
  2. We will use the connection we already have stored in MySQL Workbench (Lesson 1)
  3. As we are going through the steps in the Wizard be sure to select only the database we want to Reverse Engineer my example is test
  4. Go through the rest of the Wizard with the default values selected and you should see a screen that looks like this
  5. Now we have to go through and move our tables so we can see them all Select -> Arrange -> Autolayout (pretty awesome isn’t it)
  6. Congratulation we now have an EER Diagram (without relationships we will have to manual add those)

Conclusion

We have just created our own EER for an existing database which counts toward documentation for our project. This process is very simple but powerful tool in MySQL Workbench. Once we set up the EER we can also use it to forward engineer. Try adding a new column or table to the EER then save changes and verify the changes in from the SQL editor window. 


Final Product

dreamhost link

Filed Under: MySQL Tagged With: Database, MySQL, schema, Workbench

Subscribe to Newsletter

Archives

  • February 2021 (2)
  • January 2021 (5)
  • May 2020 (1)
  • January 2020 (1)
  • November 2019 (1)
  • October 2019 (9)
  • July 2019 (7)
  • June 2019 (8)
  • May 2019 (4)
  • April 2019 (1)
  • February 2019 (1)
  • January 2019 (2)
  • September 2018 (1)
  • August 2018 (1)
  • July 2018 (3)
  • June 2018 (6)
  • May 2018 (5)
  • April 2018 (2)
  • March 2018 (1)
  • February 2018 (4)
  • January 2018 (6)
  • December 2017 (5)
  • November 2017 (5)
  • October 2017 (3)
  • September 2017 (6)
  • August 2017 (2)
  • July 2017 (6)
  • June 2017 (5)
  • May 2017 (6)
  • April 2017 (1)
  • March 2017 (2)
  • February 2017 (1)
  • January 2017 (1)
  • December 2016 (6)
  • November 2016 (6)
  • October 2016 (1)
  • September 2016 (1)
  • August 2016 (1)
  • July 2016 (1)
  • June 2016 (2)
  • March 2016 (1)
  • February 2016 (1)
  • January 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • August 2015 (1)
  • July 2015 (2)
  • June 2015 (1)
  • May 2015 (4)
  • April 2015 (2)
  • March 2015 (1)
  • February 2015 (5)
  • January 2015 (7)
  • December 2014 (3)
  • November 2014 (4)
  • October 2014 (1)
  • May 2014 (1)
  • March 2014 (3)
  • February 2014 (3)
  • January 2014 (1)
  • September 2013 (3)
  • October 2012 (1)
  • August 2012 (2)
  • May 2012 (1)
  • April 2012 (1)
  • February 2012 (2)
  • December 2011 (1)
  • September 2011 (2)

Tags

Agile AI Apache Pig Apache Pig Latin Apache Pig Tutorial ASP.NET AWS Big Data Big Data Big Questions Book Review Books Data Analytics Data Engineer Data Engineers Data Science Deep Learning DynamoDB Hadoop Hadoop Distributed File System Hadoop Pig HBase HDFS IoT Isilon Isilon Quick Tips Learn Hadoop Machine Learning Machine Learning Engineer Management Motivation MVC NoSQL OneFS Pig Latin Pluralsight Project Management Python Quick Tip quick tips Scrum Splunk Streaming Analytics Tensorflow Tutorial Unstructured Data

Follow me on Twitter

My Tweets

Recent Posts

  • Tips & Tricks for Studying Machine Learning Projects
  • Getting Started as Big Data Product Marketing Manager
  • What is a Chief Data Officer?
  • What is an Industrial IoT Engineer with Derek Morgan
  • Ultimate List of Tensorflow Resources for Machine Learning Engineers

Copyright © 2023 · eleven40 Pro Theme on Genesis Framework · WordPress · Log in

 

Loading Comments...