Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Timeseries with postaggregators and a picture!
  • Loading branch information
dganguli committed Mar 7, 2014
1 parent 32ae0f8 commit 5a19cc1
Showing 1 changed file with 36 additions and 0 deletions.
36 changes: 36 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,39 @@ pydruid exposes a simple API to create, execute, and analyze [Druid](http://drui
#documentation

#examples

The following exampes show how to execute and analyze the results of three types of queries:timeseries, topN, and groupby. We analyze the twitter data set

## timeseries query

What was the average tweet length, per day, surrounding the 2014 Sochi olympics?

```python
from pydruid.client import *
from pylab import plt

query = PyDruid(bard_url_goes_here, 'druid/v2')

ts = query.timeseries(
datasource='twitterstream',
granularity='day',
intervals='2014-02-02/p4w',
aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')},
post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))},
filter=Dimension('first_hashtag') == 'sochi2014'
)
df = query.export_pandas()
df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0])
df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20,
title='Sochi 2014')
plt.ylabel('avg tweet length (chars)')
plt.show()
```

![alt text](https://github.com/metamx/pydruid/raw/docs/docs/figures/avg_tweet_length.png "Avg. tweet length")






0 comments on commit 5a19cc1

Please sign in to comment.