diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle index 4d724ef5..4ac6efce 100644 Binary files a/docs/build/doctrees/environment.pickle and b/docs/build/doctrees/environment.pickle differ diff --git a/docs/build/doctrees/index.doctree b/docs/build/doctrees/index.doctree index b392c085..d4bf9fd9 100644 Binary files a/docs/build/doctrees/index.doctree and b/docs/build/doctrees/index.doctree differ diff --git a/docs/build/html/genindex.html b/docs/build/html/genindex.html index 2dc6fb07..75022c83 100644 --- a/docs/build/html/genindex.html +++ b/docs/build/html/genindex.html @@ -49,10 +49,22 @@

Navigation

Index

- P + G + | P + | S | T
+

G

+ + +
+ +
groupby() (client.PyDruid method) +
+ +
+

P

@@ -69,10 +81,24 @@

P

+

S

+ + +
+ +
segment_metadata() (client.PyDruid method) +
+ +
+

T

- @@ -123,23 +290,38 @@

Welcome to PyDruid’s documentation!

+
time_boundary() (client.PyDruid method) +
+ +
timeseries() (client.PyDruid method)
diff --git a/docs/build/html/index.html b/docs/build/html/index.html index abd82735..0dd85a6e 100644 --- a/docs/build/html/index.html +++ b/docs/build/html/index.html @@ -55,35 +55,202 @@

Welcome to PyDruid’s documentation! class client.PyDruid(url, endpoint)
+
+groupby(**kwargs)
+

A group-by query groups a results set (the requested aggregate metrics) by the specified dimension(s).

+

Required key/value pairs:

+ +++ + + + + + + + +
Parameters:
    +
  • datasource (str) – Data source to query
  • +
  • granularity (str) – Time bucket to aggregate data by hour, day, minute, etc.,
  • +
  • intervals (str or list) – ISO-8601 intervals for which to run the query on
  • +
  • aggregations (dict) – A map from aggregator name to one of the pydruid.utils.aggregators e.g., doublesum
  • +
  • dimensions (list) – The dimensions to group by
  • +
+
Returns:

The query result

+
Return type:

list[dict]

+
+

Optional key/value pairs:

+ +++ + + + +
Parameters:
    +
  • filter (pydruid.utils.filters.Filter) – Indicates which rows of data to include in the query
  • +
  • post_aggregations – A dict with string key = ‘post_aggregator_name’, and value pydruid.utils.PostAggregator
  • +
+
+

Example:

+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
    >>> group = query.groupby(
+            dataSource='twitterstream',
+            granularity='hour',
+            intervals='2013-10-04/pt1h',
+            dimensions=["user_name", "reply_to_name"],
+            filter=~(Dimension("reply_to_name") == "Not A Reply"),
+            aggregations={"count": doublesum("count")}
+        )
+    >>> for k in range(2):
+        ...     print group[k]
+    >>> {'timestamp': '2013-10-04T00:00:00.000Z', 'version': 'v1', 'event': {'count': 1.0, 'user_name': 'user_1', 'reply_to_name': 'user_2'}}
+    >>> {'timestamp': '2013-10-04T00:00:00.000Z', 'version': 'v1', 'event': {'count': 1.0, 'user_name': 'user_2', 'reply_to_name': 'user_3'}}
+
+
+
+ +
+
+segment_metadata(**kwargs)
+

A segment meta-data query returns per segment information about:

+
    +
  • Cardinality of all the columns present
  • +
  • Column type
  • +
  • Estimated size in bytes
  • +
  • Estimated size in bytes of each column
  • +
  • Interval the segment covers
  • +
  • Segment ID
  • +
+

Required key/value pairs:

+ +++ + + + + + + + +
Parameters:
    +
  • datasource (str) – Data source to query
  • +
  • intervals (str or list) – ISO-8601 intervals for which to run the query on
  • +
+
Returns:

The query result

+
Return type:

list[dict]

+
+

Example:

+
1
+2
+3
+4
+5
    >>> meta = query.segment_metadata(datasource='twitterstream', intervals = '2013-10-04/pt1h')
+    >>> print meta[0].keys()
+    >>> ['intervals', 'id', 'columns', 'size']
+    >>> print meta[0]['columns']['tweet_length']
+    >>> {'errorMessage': None, 'cardinality': None, 'type': 'FLOAT', 'size': 30908008}
+
+
+
+ +
+
+time_boundary(**kwargs)
+

A time boundary query returns the min and max timestamps present in a data source.

+

Required key/value pairs:

+ +++ + + + + + + + +
Parameters:datasource (str) – Data source to query
Returns:The query result
Return type:list[dict]
+

Example:

+
1
+2
+3
    >>> bound = query.time_boundary(datasource='twitterstream')
+    >>> print bound
+    >>> [{'timestamp': '2011-09-14T15:00:00.000Z', 'result': {'minTime': '2011-09-14T15:00:00.000Z', 'maxTime': '2014-03-04T23:44:00.000Z'}}]
+
+
+
+ +
timeseries(**kwargs)
-

LOL I’m a timeseries!

+

A timeseries query returns the values of the requested metrics (in aggregate) for each timestamp.

+

Required key/value pairs:

- - - +
Parameters:
    -
  • sender (str) – The person sending the message
  • -
  • recipient (str) – The recipient of the message
  • -
  • message_body (str) – The body of the message
  • -
  • priority (integer or None) – The priority of the message, can be a number 1-5
  • +
  • datasource (str) – Data source to query
  • +
  • granularity (str) – Time bucket to aggregate data by hour, day, minute, etc.,
  • +
  • intervals (str or list) – ISO-8601 intervals for which to run the query on
  • +
  • aggregations (dict) – A map from aggregator name to one of the pydruid.utils.aggregators e.g., doublesum
Returns:

the message id

+
Returns:

The query result

Return type:

int

+
Return type:

list[dict]

Raises:
    -
  • ValueError – if the message_body exceeds 160 characters
  • -
  • TypeError – if the message_body is not a basestring
  • +
+

Optional key/value pairs:

+ +++ +
Parameters:
    +
  • filter (pydruid.utils.filters.Filter) – Indicates which rows of data to include in the query
  • +
  • post_aggregations – A dict with string key = ‘post_aggregator_name’, and value pydruid.utils.PostAggregator
+

Example:

+
1
+2
+3
+4
+5
+6
+7
+8
+9
    >>> counts = query.timeseries(
+            datasource=twitterstream,
+            granularity='hour',
+            intervals='2013-06-14/pt1h',
+            aggregations={"count": doublesum("count"), "rows": count("rows")},
+            post_aggregations={'percent': (Field('count') / Field('rows')) * Const(100))}
+        )
+    >>> print counts
+    >>> [{'timestamp': '2013-06-14T00:00:00.000Z', 'result': {'count': 9619.0, 'rows': 8007, 'percent': 120.13238416385663}}]
+
+
@@ -98,10 +265,10 @@

Welcome to PyDruid’s documentation!

Parameters:
    -
  • dataSource (str) – Data source to query
  • -
  • granularity (str) – Time bucket to aggregate data by hour, day, minute, etc.,
  • -
  • intervals (str or list) – ISO-8601 intervals for which to run the query on
  • -
  • aggregations (dict) – Key is ‘aggregator_name’, and value is one of the pydruid.utils.aggregators
  • +
  • datasource (str) – Data source to query
  • +
  • granularity (str) – Aggregate data by hour, day, minute, etc.,
  • +
  • intervals (str or list) – ISO-8601 intervals of data to query
  • +
  • aggregations (dict) – A map from aggregator name to one of the pydruid.utils.aggregators e.g., doublesum
  • dimension (str) – Dimension to run the query against
  • metric (str) – Metric over which to sort the specified dimension by
  • threshold (int) – How many of the top items to return
  • @@ -111,7 +278,7 @@

    Welcome to PyDruid’s documentation!

Returns:

The query result

Return type:

dict

+
Return type:

list[dict]

Parameters:
  • filter (pydruid.utils.filters.Filter) – Indicates which rows of data to include in the query
  • -
  • postAggregations – A dict with string key = ‘post_aggregator_name’, and value pydruid.utils.PostAggregator
  • +
  • post_aggregations – A dict with string key = ‘post_aggregator_name’, and value pydruid.utils.PostAggregator

Example:

-
-
>> top = query.topn(dataSource=’my_data’,
-
granularity=’hour’, -intervals=’[“2013-06-14/pt2h”]’, -aggregations={“count”: doubleSum(“count”)}, -dimension=’my_dimension’, -metric=’count’, -threshold= 5 -)
-
+
 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
    >>> top = query.topn(
+                datasource='twitterstream',
+                granularity='all',
+                intervals='2013-06-14/pt1h',
+                aggregations={"count": doublesum("count")},
+                dimension='user',
+                metric='count',
+                filter=Dimension('language') == 'en',
+                threshold=1
+            )
+    >>> print top
+    >>> [{'timestamp': '2013-06-14T00:00:00.000Z', 'result': [{'count': 22.0, 'user': "cool_user"}}]}]
+
+
diff --git a/docs/build/html/objects.inv b/docs/build/html/objects.inv index 8fd6a2ed..6811cbfa 100644 Binary files a/docs/build/html/objects.inv and b/docs/build/html/objects.inv differ diff --git a/docs/build/html/searchindex.js b/docs/build/html/searchindex.js index df7048d0..67476864 100644 --- a/docs/build/html/searchindex.js +++ b/docs/build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({envversion:42,terms:{recipi:0,queri:0,over:0,threshold:0,paramet:0,current:[],typeerror:0,param:[],granular:0,dict:0,my_dimens:0,spec:0,"case":0,build_queri:[],sourc:0,"return":0,string:0,get:[],dai:0,util:0,datasourc:0,discuss:[],requir:0,list:0,integ:0,item:0,page:0,dimens:0,set:0,interv:0,message_bodi:0,result:0,arg:[],index:0,conceptu:0,content:0,state:[],approxim:0,row:0,run:0,kei:0,timeseri:0,bodi:0,valu:0,search:0,lol:0,post_aggregator_nam:0,sender:0,rtype:[],aggregator_nam:0,groupbi:0,against:0,filter:0,etc:0,iso:0,pt2h:0,basestr:0,mani:0,my_data:0,effici:0,doublesum:0,modul:0,number:0,send:0,topn:0,prioriti:0,given:0,top:0,messag:0,includ:0,type:0,valueerror:0,more:0,sort:0,option:0,specifi:0,argument:[],exce:0,than:0,count:0,none:0,endpoint:0,word:[],hour:0,bucket:0,charact:0,defin:[],below:[],can:0,str:0,minut:0,aggreg:0,"int":0,metric:0,fuck:[],repres:[],kwarg:0,end:[],how:0,client:0,bool:[],which:0,postaggreg:0,singl:0,resourc:0,groupbyqueri:0,object:[],fart:[],rais:0,pair:0,data:0,"class":0,faster:0,url:0,thought:0,person:0,exampl:0,thi:0,time:0,order:0},objtypes:{"0":"py:module","1":"py:method","2":"py:class"},objnames:{"0":["py","module","Python module"],"1":["py","method","Python method"],"2":["py","class","Python class"]},filenames:["index"],titles:["Welcome to PyDruid’s documentation!"],objects:{"":{pydruid:[0,0,0,"-"]},client:{PyDruid:[0,2,1,""]},"client.PyDruid":{topn:[0,1,1,""],timeseries:[0,1,1,""]}},titleterms:{pydruid:0,document:0,welcom:0,indic:0,tabl:0}}) \ No newline at end of file +Search.setIndex({envversion:42,terms:{recipi:[],all:0,code:[],queri:0,over:0,threshold:0,languag:0,paramet:0,"04t00":0,current:[],typeerror:[],group:0,how:0,"000z_2013":[],send:[],granular:0,estim:0,dict:0,my_dimens:[],pt1h:0,spec:0,url_domain:[],build_queri:[],sourc:0,"return":0,string:0,first_hashtag:[],get:[],python:[],timestamp:0,dai:0,number:[],util:0,fart:[],post_aggreg:0,datasourc:0,"04t23":0,discuss:[],requir:0,name:0,list:0,integ:[],item:0,each:0,user_nam:0,page:0,user_mention_nam:[],dimens:0,set:0,hour:0,count:0,twitter:[],message_bodi:[],user_total_tweet:[],meta:0,result:0,arg:[],user_time_zon:[],event:0,num_ment:[],index:0,time_boundari:0,defin:[],cool_us:0,boundari:0,per:0,content:0,state:[],version:0,lineno:[],print:0,"import":[],row:0,math:[],has_geo:[],has_link:[],run:0,kei:0,timeseri:0,bodi:[],"05t04":[],fucker:[],"05t00":[],mintim:0,"const":0,"byte":0,valu:0,search:0,lol:[],post_aggregator_nam:0,sender:[],rtype:[],column:0,twitterstream:0,aggregator_nam:[],groupbi:0,against:0,user_2:0,etc:0,user_1:0,iso:0,lineo:[],pt2h:[],basestr:[],mani:0,my_data:[],block:[],user:0,user_lang:[],rang:0,effici:0,doublesum:0,modul:0,"float":0,bound:0,"000z":0,done:[],param:[],rt_name:[],repli:0,size:0,prioriti:[],given:0,from:0,messag:[],data:0,top:0,"14t00":0,dude:[],"long":[],errormessag:0,num_link:[],is_retweet:[],type:0,interv:0,includ:0,sort:0,option:0,about:0,specifi:0,fuck:[],"280z":[],"__time":[],butt:[],num_hashtag:[],than:0,present:0,"case":0,none:0,endpoint:0,word:[],"14t15":0,bucket:0,maxtim:0,charact:[],conceptu:0,below:[],exce:[],can:0,str:0,has_ment:[],minut:0,more:0,aggreg:0,person:[],"int":0,request:0,metric:0,argument:[],deep:[],repres:[],twitterstream_2013:[],kwarg:0,user_loc:[],reply_to_nam:0,cardin:0,filter:0,num_follow:[],end:[],user_3:0,percent:0,num_favorit:[],field:0,exampl:0,bool:[],which:0,is_vir:[],tweet_length:0,postaggreg:0,simpl:[],singl:0,map:0,resourc:0,valueerror:[],usernam:[],max:0,groupbyqueri:0,object:[],approxim:0,topn:0,rais:[],segment_metadata:0,pair:0,segment:0,"class":0,faster:0,url:0,min:0,cover:0,thought:0,inform:0,client:0,thi:0,time:0,order:0},objtypes:{"0":"py:module","1":"py:method","2":"py:class"},objnames:{"0":["py","module","Python module"],"1":["py","method","Python method"],"2":["py","class","Python class"]},filenames:["index"],titles:["Welcome to PyDruid’s documentation!"],objects:{"":{pydruid:[0,0,0,"-"]},client:{PyDruid:[0,2,1,""]},"client.PyDruid":{topn:[0,1,1,""],groupby:[0,1,1,""],time_boundary:[0,1,1,""],timeseries:[0,1,1,""],segment_metadata:[0,1,1,""]}},titleterms:{pydruid:0,document:0,welcom:0,indic:0,tabl:0}}) \ No newline at end of file