+ - time_boundary() (client.PyDruid method)
+
+
+
- timeseries() (client.PyDruid method)
diff --git a/docs/build/html/index.html b/docs/build/html/index.html
index abd82735..0dd85a6e 100644
--- a/docs/build/html/index.html
+++ b/docs/build/html/index.html
@@ -55,35 +55,202 @@ Welcome to PyDruid’s documentation!
+-
+groupby(**kwargs)
+A group-by query groups a results set (the requested aggregate metrics) by the specified dimension(s).
+Required key/value pairs:
+
+
+
+
+Parameters: |
+- datasource (str) – Data source to query
+- granularity (str) – Time bucket to aggregate data by hour, day, minute, etc.,
+- intervals (str or list) – ISO-8601 intervals for which to run the query on
+- aggregations (dict) – A map from aggregator name to one of the pydruid.utils.aggregators e.g., doublesum
+- dimensions (list) – The dimensions to group by
+
+ |
+
+Returns: | The query result
+ |
+
+Return type: | list[dict]
+ |
+
+
+
+Optional key/value pairs:
+
+
+
+
+Parameters: |
+- filter (pydruid.utils.filters.Filter) – Indicates which rows of data to include in the query
+- post_aggregations – A dict with string key = ‘post_aggregator_name’, and value pydruid.utils.PostAggregator
+
+ |
+
+
+
+Example:
+ 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12 | >>> group = query.groupby(
+ dataSource='twitterstream',
+ granularity='hour',
+ intervals='2013-10-04/pt1h',
+ dimensions=["user_name", "reply_to_name"],
+ filter=~(Dimension("reply_to_name") == "Not A Reply"),
+ aggregations={"count": doublesum("count")}
+ )
+ >>> for k in range(2):
+ ... print group[k]
+ >>> {'timestamp': '2013-10-04T00:00:00.000Z', 'version': 'v1', 'event': {'count': 1.0, 'user_name': 'user_1', 'reply_to_name': 'user_2'}}
+ >>> {'timestamp': '2013-10-04T00:00:00.000Z', 'version': 'v1', 'event': {'count': 1.0, 'user_name': 'user_2', 'reply_to_name': 'user_3'}}
+
+ |
+
+
+
+-
+segment_metadata(**kwargs)
+A segment meta-data query returns per segment information about:
+
+- Cardinality of all the columns present
+- Column type
+- Estimated size in bytes
+- Estimated size in bytes of each column
+- Interval the segment covers
+- Segment ID
+
+Required key/value pairs:
+
+
+
+
+Parameters: |
+- datasource (str) – Data source to query
+- intervals (str or list) – ISO-8601 intervals for which to run the query on
+
+ |
+
+Returns: | The query result
+ |
+
+Return type: | list[dict]
+ |
+
+
+
+Example:
+ | >>> meta = query.segment_metadata(datasource='twitterstream', intervals = '2013-10-04/pt1h')
+ >>> print meta[0].keys()
+ >>> ['intervals', 'id', 'columns', 'size']
+ >>> print meta[0]['columns']['tweet_length']
+ >>> {'errorMessage': None, 'cardinality': None, 'type': 'FLOAT', 'size': 30908008}
+
+ |
+
+
+
+-
+time_boundary(**kwargs)
+A time boundary query returns the min and max timestamps present in a data source.
+Required key/value pairs:
+
+
+
+
+Parameters: | datasource (str) – Data source to query |
+
+Returns: | The query result |
+
+Return type: | list[dict] |
+
+
+
+Example:
+ | >>> bound = query.time_boundary(datasource='twitterstream')
+ >>> print bound
+ >>> [{'timestamp': '2011-09-14T15:00:00.000Z', 'result': {'minTime': '2011-09-14T15:00:00.000Z', 'maxTime': '2014-03-04T23:44:00.000Z'}}]
+
+ |
+
+
+
-
timeseries(**kwargs)
-LOL I’m a timeseries!
+A timeseries query returns the values of the requested metrics (in aggregate) for each timestamp.
+Required key/value pairs:
Parameters: |
-- sender (str) – The person sending the message
-- recipient (str) – The recipient of the message
-- message_body (str) – The body of the message
-- priority (integer or None) – The priority of the message, can be a number 1-5
+- datasource (str) – Data source to query
+- granularity (str) – Time bucket to aggregate data by hour, day, minute, etc.,
+- intervals (str or list) – ISO-8601 intervals for which to run the query on
+- aggregations (dict) – A map from aggregator name to one of the pydruid.utils.aggregators e.g., doublesum
|
-Returns: | the message id
+ |
---|
Returns: | The query result
|
-Return type: | int
+ |
---|
Return type: | list[dict]
|
-Raises: |
-- ValueError – if the message_body exceeds 160 characters
-- TypeError – if the message_body is not a basestring
+ |
+
+Optional key/value pairs:
+
+
+
+
+Parameters: |
+- filter (pydruid.utils.filters.Filter) – Indicates which rows of data to include in the query
+- post_aggregations – A dict with string key = ‘post_aggregator_name’, and value pydruid.utils.PostAggregator
|
+Example:
+1
+2
+3
+4
+5
+6
+7
+8
+9 | >>> counts = query.timeseries(
+ datasource=twitterstream,
+ granularity='hour',
+ intervals='2013-06-14/pt1h',
+ aggregations={"count": doublesum("count"), "rows": count("rows")},
+ post_aggregations={'percent': (Field('count') / Field('rows')) * Const(100))}
+ )
+ >>> print counts
+ >>> [{'timestamp': '2013-06-14T00:00:00.000Z', 'result': {'count': 9619.0, 'rows': 8007, 'percent': 120.13238416385663}}]
+
+ |
@@ -98,10 +265,10 @@ Welcome to PyDruid’s documentation!
Parameters: |
-- dataSource (str) – Data source to query
-- granularity (str) – Time bucket to aggregate data by hour, day, minute, etc.,
-- intervals (str or list) – ISO-8601 intervals for which to run the query on
-- aggregations (dict) – Key is ‘aggregator_name’, and value is one of the pydruid.utils.aggregators
+- datasource (str) – Data source to query
+- granularity (str) – Aggregate data by hour, day, minute, etc.,
+- intervals (str or list) – ISO-8601 intervals of data to query
+- aggregations (dict) – A map from aggregator name to one of the pydruid.utils.aggregators e.g., doublesum
- dimension (str) – Dimension to run the query against
- metric (str) – Metric over which to sort the specified dimension by
- threshold (int) – How many of the top items to return
@@ -111,7 +278,7 @@ Welcome to PyDruid’s documentation!Returns: | The query result
|
|
-Return type: | dict
+ |
---|
Return type: | list[dict]
|
@@ -123,23 +290,38 @@ Welcome to PyDruid’s documentation! |