Data Tracking Using the Cloud

Collecting and visualizing data is a powerful tool in the arsenal of a tech artist. Whether you’re collecting telemetry from tools (to track usage, breakages, crashes etc), logging bugs submitted, or gathering data from a completely offline process, like analyzing a build for texture usage, being able to collect, store and visualize this information is phenomenally useful.

In this project we’ll cover:

  • Determining a data source to track over time, and mining this data
  • Configuring a cloud-based SQL database
  • Injecting the collected data into the SQL database
  • Configuring a visualization method

 

Determining a Data Source:

The focus of this project is collecting and presenting data, so we need a simple data source that will change over time. After some thought, a simple solution would be to parse news websites for the frequency of some keywords likely to be in the news for the next couple of months.

So the goal is to collect a variety of news websites, parse them with Python, and simply count the number of times the word appears.

We can use the python module “urllib” to fetch and parse the page, so the hard work is already done (thanks Python!)

Here’s the final code to what we want. It’s very simple at it’s core!

 

For the remaining steps, we are invested in these solutions:

  • Database type (SQL / MySQL)
  • Database platform (Azure / AWS / Google Cloud Platform)
  • Visualization platform (Periscope / Google Data Studio)

I opted to explore two different paths – choose the one that fits your use case the closest:

Improvements:

  • Set the script to run on a server (Azure virtual machine, for instance)

 

Leave a Reply

Your email address will not be published. Required fields are marked *