-
Notifications
You must be signed in to change notification settings - Fork 202
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6 from metamx/docs
Docs
- Loading branch information
Showing
43 changed files
with
4,686 additions
and
100 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
*.pyc | ||
test.py | ||
pydruid/bard.py | ||
.idea/ | ||
.DS_store | ||
.idea/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
#pydruid | ||
pydruid exposes a simple API to create, execute, and analyze [Druid](http://druid.io/) queries. pydruid can parse query results into [Pandas](http://pandas.pydata.org/) DataFrame objects for subsequent data analysis -- this offers a tight integration between [Druid](http://druid.io/), the [SciPy](http://www.scipy.org/stackspec.html) stack (for scientific computing) and [scikit-learn](http://scikit-learn.org/stable/) (for machine learning). Additionally, pydruid can export query results into TSV or JSON for further processing with your favorite tool, e.g., R, Julia, Matlab, Excel. | ||
|
||
#examples | ||
|
||
The following exampes show how to execute and analyze the results of three types of queries: timeseries, topN, and groupby. We will use these queries to ask simple questions about twitter's public data set. | ||
|
||
## timeseries | ||
|
||
What was the average tweet length, per day, surrounding the 2014 Sochi olympics? | ||
|
||
```python | ||
from pydruid.client import * | ||
from pylab import plt | ||
|
||
query = PyDruid(bard_url_goes_here, 'druid/v2') | ||
|
||
ts = query.timeseries( | ||
datasource='twitterstream', | ||
granularity='day', | ||
intervals='2014-02-02/p4w', | ||
aggregations={'length': doublesum('tweet_length'), 'count': doublesum('count')}, | ||
post_aggregations={'avg_tweet_length': (Field('length') / Field('count'))}, | ||
filter=Dimension('first_hashtag') == 'sochi2014' | ||
) | ||
df = query.export_pandas() | ||
df['timestamp'] = df['timestamp'].map(lambda x: x.split('T')[0]) | ||
df.plot(x='timestamp', y='avg_tweet_length', ylim=(80, 140), rot=20, | ||
title='Sochi 2014') | ||
plt.ylabel('avg tweet length (chars)') | ||
plt.show() | ||
``` | ||
|
||
![alt text](https://github.com/metamx/pydruid/raw/docs/docs/figures/avg_tweet_length.png "Avg. tweet length") | ||
|
||
## topN | ||
|
||
Who were the top ten mentions (@user_name) during the 2014 Oscars? | ||
|
||
```python | ||
top = query.topn( | ||
datasource='twitterstream', | ||
granularity='all', | ||
intervals='2014-03-03/p1d', # utc time of 2014 oscars | ||
aggregations={'count': doublesum('count')}, | ||
dimension='user_mention_name', | ||
filter=(Dimension('user_lang') == 'en') & (Dimension('first_hashtag') == 'oscars') & | ||
(Dimension('user_time_zone') == 'Pacific Time (US & Canada)') & | ||
~(Dimension('user_mention_name') == 'No Mention'), | ||
metric='count', | ||
threshold=10 | ||
) | ||
|
||
df = query.export_pandas() | ||
print df | ||
|
||
count timestamp user_mention_name | ||
0 1303 2014-03-03T00:00:00.000Z TheEllenShow | ||
1 44 2014-03-03T00:00:00.000Z TheAcademy | ||
2 21 2014-03-03T00:00:00.000Z MTV | ||
3 21 2014-03-03T00:00:00.000Z peoplemag | ||
4 17 2014-03-03T00:00:00.000Z THR | ||
5 16 2014-03-03T00:00:00.000Z ItsQueenElsa | ||
6 16 2014-03-03T00:00:00.000Z eonline | ||
7 15 2014-03-03T00:00:00.000Z PerezHilton | ||
8 14 2014-03-03T00:00:00.000Z realjohngreen | ||
9 12 2014-03-03T00:00:00.000Z KevinSpacey | ||
|
||
``` | ||
|
||
## groupby | ||
|
||
What does the social network of users replying to other users look like? | ||
|
||
```python | ||
from igraph import * | ||
from cairo import * | ||
from pandas import concat | ||
|
||
group = query.groupby( | ||
datasource='twitterstream', | ||
granularity='hour', | ||
intervals='2013-10-04/pt12h', | ||
dimensions=["user_name", "reply_to_name"], | ||
filter=(~(Dimension("reply_to_name") == "Not A Reply")) & | ||
(Dimension("user_location") == "California"), | ||
aggregations={"count": doublesum("count")} | ||
) | ||
|
||
df = query.export_pandas() | ||
|
||
# map names to categorical variables with a lookup table | ||
names = concat([df['user_name'], df['reply_to_name']]).unique() | ||
nameLookup = dict([pair[::-1] for pair in enumerate(names)]) | ||
df['user_name_lookup'] = df['user_name'].map(nameLookup.get) | ||
df['reply_to_name_lookup'] = df['reply_to_name'].map(nameLookup.get) | ||
|
||
# create the graph with igraph | ||
g = Graph(len(names), directed=False) | ||
vertices = zip(df['user_name_lookup'], df['reply_to_name_lookup']) | ||
g.vs["name"] = names | ||
g.add_edges(vertices) | ||
layout = g.layout_fruchterman_reingold() | ||
plot(g, "tweets.png", layout=layout, vertex_size=2, bbox=(400, 400), margin=25, edge_width=1, vertex_color="blue") | ||
``` | ||
|
||
![alt text](https://github.com/metamx/pydruid/raw/docs/docs/figures/twitter_graph.png "Social Network") | ||
|
||
|
||
#documentation | ||
|
||
pydruid is a [Sphinx](http://sphinx-doc.org/) project. You can view documentation locally by opening the following with a web browser: | ||
|
||
```python | ||
pydruid/docs/build/html/index.html | ||
``` | ||
|
||
The docstrings are written in [ReStructuredText](http://docutils.sourceforge.net/rst.html). If edited, documentation can be re-generated by running: | ||
|
||
```python | ||
make html | ||
``` | ||
|
||
from within the docs directory, assuming Sphinx is installed on your machine. | ||
|
||
|
||
|
||
|
This file was deleted.
Oops, something went wrong.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,177 @@ | ||
# Makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line. | ||
SPHINXOPTS = | ||
SPHINXBUILD = sphinx-build | ||
PAPER = | ||
BUILDDIR = build | ||
|
||
# User-friendly check for sphinx-build | ||
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) | ||
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) | ||
endif | ||
|
||
# Internal variables. | ||
PAPEROPT_a4 = -D latex_paper_size=a4 | ||
PAPEROPT_letter = -D latex_paper_size=letter | ||
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source | ||
# the i18n builder cannot share the environment and doctrees with the others | ||
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source | ||
|
||
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext | ||
|
||
help: | ||
@echo "Please use \`make <target>' where <target> is one of" | ||
@echo " html to make standalone HTML files" | ||
@echo " dirhtml to make HTML files named index.html in directories" | ||
@echo " singlehtml to make a single large HTML file" | ||
@echo " pickle to make pickle files" | ||
@echo " json to make JSON files" | ||
@echo " htmlhelp to make HTML files and a HTML help project" | ||
@echo " qthelp to make HTML files and a qthelp project" | ||
@echo " devhelp to make HTML files and a Devhelp project" | ||
@echo " epub to make an epub" | ||
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" | ||
@echo " latexpdf to make LaTeX files and run them through pdflatex" | ||
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" | ||
@echo " text to make text files" | ||
@echo " man to make manual pages" | ||
@echo " texinfo to make Texinfo files" | ||
@echo " info to make Texinfo files and run them through makeinfo" | ||
@echo " gettext to make PO message catalogs" | ||
@echo " changes to make an overview of all changed/added/deprecated items" | ||
@echo " xml to make Docutils-native XML files" | ||
@echo " pseudoxml to make pseudoxml-XML files for display purposes" | ||
@echo " linkcheck to check all external links for integrity" | ||
@echo " doctest to run all doctests embedded in the documentation (if enabled)" | ||
|
||
clean: | ||
rm -rf $(BUILDDIR)/* | ||
|
||
html: | ||
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html | ||
@echo | ||
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html." | ||
|
||
dirhtml: | ||
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml | ||
@echo | ||
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." | ||
|
||
singlehtml: | ||
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml | ||
@echo | ||
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." | ||
|
||
pickle: | ||
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle | ||
@echo | ||
@echo "Build finished; now you can process the pickle files." | ||
|
||
json: | ||
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json | ||
@echo | ||
@echo "Build finished; now you can process the JSON files." | ||
|
||
htmlhelp: | ||
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp | ||
@echo | ||
@echo "Build finished; now you can run HTML Help Workshop with the" \ | ||
".hhp project file in $(BUILDDIR)/htmlhelp." | ||
|
||
qthelp: | ||
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp | ||
@echo | ||
@echo "Build finished; now you can run "qcollectiongenerator" with the" \ | ||
".qhcp project file in $(BUILDDIR)/qthelp, like this:" | ||
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/PyDruid.qhcp" | ||
@echo "To view the help file:" | ||
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/PyDruid.qhc" | ||
|
||
devhelp: | ||
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp | ||
@echo | ||
@echo "Build finished." | ||
@echo "To view the help file:" | ||
@echo "# mkdir -p $$HOME/.local/share/devhelp/PyDruid" | ||
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/PyDruid" | ||
@echo "# devhelp" | ||
|
||
epub: | ||
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub | ||
@echo | ||
@echo "Build finished. The epub file is in $(BUILDDIR)/epub." | ||
|
||
latex: | ||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | ||
@echo | ||
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." | ||
@echo "Run \`make' in that directory to run these through (pdf)latex" \ | ||
"(use \`make latexpdf' here to do that automatically)." | ||
|
||
latexpdf: | ||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | ||
@echo "Running LaTeX files through pdflatex..." | ||
$(MAKE) -C $(BUILDDIR)/latex all-pdf | ||
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." | ||
|
||
latexpdfja: | ||
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex | ||
@echo "Running LaTeX files through platex and dvipdfmx..." | ||
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja | ||
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." | ||
|
||
text: | ||
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text | ||
@echo | ||
@echo "Build finished. The text files are in $(BUILDDIR)/text." | ||
|
||
man: | ||
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man | ||
@echo | ||
@echo "Build finished. The manual pages are in $(BUILDDIR)/man." | ||
|
||
texinfo: | ||
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo | ||
@echo | ||
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." | ||
@echo "Run \`make' in that directory to run these through makeinfo" \ | ||
"(use \`make info' here to do that automatically)." | ||
|
||
info: | ||
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo | ||
@echo "Running Texinfo files through makeinfo..." | ||
make -C $(BUILDDIR)/texinfo info | ||
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." | ||
|
||
gettext: | ||
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale | ||
@echo | ||
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." | ||
|
||
changes: | ||
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes | ||
@echo | ||
@echo "The overview file is in $(BUILDDIR)/changes." | ||
|
||
linkcheck: | ||
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck | ||
@echo | ||
@echo "Link check complete; look for any errors in the above output " \ | ||
"or in $(BUILDDIR)/linkcheck/output.txt." | ||
|
||
doctest: | ||
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest | ||
@echo "Testing of doctests in the sources finished, look at the " \ | ||
"results in $(BUILDDIR)/doctest/output.txt." | ||
|
||
xml: | ||
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml | ||
@echo | ||
@echo "Build finished. The XML files are in $(BUILDDIR)/xml." | ||
|
||
pseudoxml: | ||
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml | ||
@echo | ||
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." |
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: e0a9069122c14226639caf781d513013 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
.. PyDruid documentation master file, created by | ||
sphinx-quickstart on Mon Mar 3 16:38:17 2014. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
|
||
Welcome to PyDruid's documentation! | ||
=================================== | ||
|
||
Contents: | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
|
||
.. automodule:: pydruid | ||
|
||
.. autoclass:: client.PyDruid | ||
:members: | ||
|
||
Indices and tables | ||
================== | ||
|
||
* :ref:`genindex` | ||
* :ref:`modindex` | ||
* :ref:`search` | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.