Skip to content

Loading Monthly Summaries into MySQL

Paul Houle edited this page Jan 21, 2014 · 1 revision

I created the telepathSQL project to try loading monthly data into MySQL. I am loading one month worth of data (2013-01), all languages, all URIs that have gotten more than ten hits for the month.

I tried something really simple, loading the data into the schema

https://github.com/paulhoule/telepath/blob/master/telepathSql/src/main/sql/monthlies.sql

with an index efficient for doing 'top N of language A' queries (and not for anything else). Just as a rough number, on CECILLE it takes about 30 minutes to load a 1/23 shard of this data, if it keeps going at this rate it may be done tomorrow morning. Overall we're dealing with data sets that are possible to work with in mysql, but it's getting difficult -- particularly note that we have 60 months worth of data, so it gets way bigger than this.

We need to go back to using Hadoop.