Branch | CI | Python Versions |
---|---|---|
master |
scrapy-datadog-extension is a Scrapy extension to send metrics from your spiders executions to Datadog (scrapy stats).
There is no public pre-packaged version yet. If you want to use it you
will have to clone the project and make it installable easilly from the
requirements.txt
.
First, you will need to include the extension to the EXTENSIONS
dict located
in your settings.py
file. For example:
EXTENSIONS = {
'scrapy-datadog-extension': 1,
}
Then you need to provide the followings variables, directly from the scrapinghub settings of your jobs:
DATADOG_API_KEY
: Your Datadog API key.DATADOG_APP_KEY
: Your Datadog APP key.DATADOG_CUSTOM_TAGS
: List of tags to bind on metricsDATADOG_CUSTOM_METRICS
: Sub list of metrics to send to DatadogDATADOG_METRICS_PREFIX
: What prefix you want to apply to all of your metrics, e.g.:kp.
DATADOG_HOST_NAME
: The hostname you want your metrics to be associated with. e.g.:app.scrapinghub.com
.
Sometimes one might need to set tags at runtime. For example to compute
them out of the spider arguments. To allow such scenario, just set a
tags
attribute to your spider with a list of statsd
compatible keys
(i.e. ["foo", ...]
or ["foo:bar", ...]
). Note that all metrics will
then be tagged as well.
Basically, this extension will, on the spider_closed
signal execution, collect
the scrapy stats associated to a given projct/spider/job and extract a list
of variables listed in a stats_to_collect
list, custom variables will be also
be added:
elapsed_time
: which is a simple computation offinish-time - start_time
.done
: a simple counter, acting like a ping to indicate that a job is ran regularly.
At the end, we have a list of metrics, with tags associated (to enable better filtering from Datadog):
project
: The scrapinghub project ID.spider_name
: The scrapinghub spider name as defined in the spider class.
Then, everything is sent to Datadog, using the Datadog API.
- Sometimes, when the
spider_closed
is executed right after the job completion, some scrapy stats are missing so we send incomplete list of metrics, preventing us to rely 100% on this extension.
By the way we're hiring across the world 👇
Join our engineering team to help us building data intensive projects! We are looking for people who love their craft and are the best at it.
- Data Engineers in Singapore and Paris
- Data Support Engineers in Singapore
- Data Engineer interns in Singapore and Paris
This code is MIT licensed.
Designed & built by Kpler engineers with a 💻 and some 🍣.