Data Tracking Using the Cloud
Collecting and visualizing data is a powerful tool in the arsenal of a tech artist. Whether you’re collecting telemetry from tools (to track usage, breakages, crashes etc), logging bugs submitted, or gathering data from a completely offline process, like analyzing a build for texture usage, being able to collect, store and visualize this information is phenomenally useful.
In this project we’ll cover:
- Determining a data source to track over time, and mining this data
- Configuring a cloud-based SQL database
- Injecting the collected data into the SQL database
- Configuring a visualization method
Determining a Data Source:
The focus of this project is collecting and presenting data, so we need a simple data source that will change over time. After some thought, a simple solution would be to parse news websites for the frequency of some keywords likely to be in the news for the next couple of months.
So the goal is to collect a variety of news websites, parse them with Python, and simply count the number of times the word appears.
We can use the python module “urllib” to fetch and parse the page, so the hard work is already done (thanks Python!)
Here’s the final code to what we want. It’s very simple at it’s core!
urls = [r'http://www.bbc.co.uk/news', 'http://www.cnn.com', 'http://www.lemonde.fr']
for url in urls:
f = urllib.urlopen(url)
url_data = f.read().lower() # Force it to lower case for string matching
print url_data.count("economy"), url
# 25 http://www.bbc.co.uk/news
# 44 http://www.cnn.com
# 48 http://www.lemonde.fr
For the remaining steps, we are invested in these solutions:
- Database type (SQL / MySQL)
- Database platform (Azure / AWS / Google Cloud Platform)
- Visualization platform (Periscope / Google Data Studio)
I opted to explore two different paths – choose the one that fits your use case the closest:
- Microsoft Azure -> MS SQL -> Periscope
- More robust and more options, but costs, so better for medium studios or those with a budget to spend
- Google Cloud Platform -> MySQL -> Google Data Studio
- Free for 60 days and minimal costs after, so better for students, hobbyists and indies
- Set the script to run on a server (Azure virtual machine, for instance)