-
Notifications
You must be signed in to change notification settings - Fork 1
Loading Monthly Summaries into MySQL
I created the telepathSQL
project to try loading monthly data into MySQL. I am loading one month worth of data (2013-01), all languages, all URIs that have gotten more than ten hits for the month.
I tried something really simple, loading the data into the schema
https://github.com/paulhoule/telepath/blob/master/telepathSql/src/main/sql/monthlies.sql
with an index efficient for doing 'top N of language A' queries (and not for anything else). Just as a rough number, on CECILLE it takes about 30 minutes to load a 1/23 shard of this data, if it keeps going at this rate it may be done tomorrow morning. Overall we're dealing with data sets that are possible to work with in mysql, but it's getting difficult -- particularly note that we have 60 months worth of data, so it gets way bigger than this.
We need to go back to using Hadoop.