-
Notifications
You must be signed in to change notification settings - Fork 179
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support telemetry using Scarf (#250)
Export telemetry related to DAG Factory usage to [Scarf](https://about.scarf.sh/). This data assists the project maintainers in better understanding how DAG Factory is used. Insights from this telemetry are critical for prioritizing patches, minor releases, and security fixes. Additionally, this information supports critical decisions related to the development road map. Deployments and individual users can opt out of analytics by setting the configuration: ``` [dag_factory] enable_telemetry False ``` As described in the [official documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also possible to opt-out by setting one of the following environment variables: ```commandline AIRFLOW__DAG_FACTORY__ENABLE_TELEMETRY=False DO_NOT_TRACK=True SCARF_NO_ANALYTICS=True ``` In addition to Scarf's default data collection, DAG Factory collects the following information: - DAG Factory version - Airflow version - Python version - Operating system & machine architecture - Event type - Number of DAGs - Number of TaskGroups - Number of Tasks No user-identifiable information (IP included) is stored in Scarf, even though Scarf infers information from the IP, such as location, and stores that. The data collection is GDPR compliant. The data is not currently being emitted for pre-releases except from integration tests. The Apache Foundation supports this same strategy in many of its OpenSource projects, including Airflow ([#39510](apache/airflow#39510)). Example of visualisation of the data via the Scarf UI: <img width="1624" alt="Screenshot 2024-10-17 at 01 56 09" src="https://github.com/user-attachments/assets/d4191834-1e02-4192-811b-125d3fa735fe"> <img width="1624" alt="Screenshot 2024-10-17 at 01 55 59" src="https://github.com/user-attachments/assets/cd814e11-7f77-45c8-95a0-56e29d9f9f12"> <img width="1624" alt="Screenshot 2024-10-17 at 01 55 47" src="https://github.com/user-attachments/assets/2950ddbb-ea25-415f-b61c-3fbdcf4fc739"> <img width="1624" alt="Screenshot 2024-10-17 at 01 55 42" src="https://github.com/user-attachments/assets/a56ecefd-0cd0-486c-9faf-026b1e9a4ceb"> Closes: #214
- Loading branch information
Showing
13 changed files
with
301 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,36 @@ | ||
# Privacy Notice | ||
|
||
This project follows the [Privacy Policy of Astronomer](https://www.astronomer.io/privacy/) | ||
This project follows the [Privacy Policy of Astronomer](https://www.astronomer.io/privacy/). | ||
|
||
## Collection of Data | ||
|
||
DAG Factory integrates [Scarf](https://about.scarf.sh/) to collect basic telemetry data during operation. | ||
This data assists the project maintainers in better understanding how DAG Factory is used. | ||
Insights gained from this telemetry are critical for prioritizing patches, minor releases, and | ||
security fixes. Additionally, this information supports key decisions related to the development road map. | ||
|
||
Deployments and individual users can opt-out of analytics by setting the configuration: | ||
|
||
``` | ||
[dag_factory] enable_telemetry False | ||
``` | ||
|
||
As described in the [official documentation](https://docs.scarf.sh/gateway/#do-not-track), it is also possible to opt out by setting one of the following environment variables: | ||
|
||
```commandline | ||
DO_NOT_TRACK=True | ||
SCARF_NO_ANALYTICS=True | ||
``` | ||
|
||
In addition to Scarf's default data collection, DAG Factory collects the following information: | ||
|
||
- DAG Factory version | ||
- Airflow version | ||
- Python version | ||
- Operating system & machine architecture | ||
- Event type | ||
- Number of DAGs | ||
- Number of TaskGroups | ||
- Number of Tasks | ||
|
||
No user-identifiable information (IP included) is stored in Scarf. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
TELEMETRY_URL = "https://astronomer.gateway.scarf.sh/dag-factory/{telemetry_version}/{dagfactory_version}/{airflow_version}/{python_version}/{platform_system}/{platform_machine}?{query_string}" | ||
TELEMETRY_VERSION = "v1" | ||
TELEMETRY_TIMEOUT = 5.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
from __future__ import annotations | ||
|
||
import os | ||
|
||
from airflow.configuration import conf | ||
|
||
|
||
def convert_to_boolean(value: str | None) -> bool: | ||
""" | ||
Convert a string that represents a boolean to a Python boolean. | ||
""" | ||
value = str(value).lower().strip() | ||
if value in ("f", "false", "0", "", "none"): | ||
return False | ||
return True | ||
|
||
|
||
enable_telemetry = conf.getboolean("dag_factory", "enable_telemetry", fallback=True) | ||
do_not_track = convert_to_boolean(os.getenv("DO_NOT_TRACK")) | ||
no_analytics = convert_to_boolean(os.getenv("SCARF_NO_ANALYTICS")) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
from __future__ import annotations | ||
|
||
import logging | ||
import platform | ||
from urllib.parse import urlencode | ||
|
||
import httpx | ||
from airflow import __version__ as airflow_version | ||
|
||
import dagfactory | ||
from dagfactory import constants, settings | ||
|
||
|
||
def should_emit() -> bool: | ||
""" | ||
Identify if telemetry metrics should be emitted or not. | ||
""" | ||
return settings.enable_telemetry and not settings.do_not_track and not settings.no_analytics | ||
|
||
|
||
def collect_standard_usage_metrics() -> dict[str, object]: | ||
""" | ||
Return standard telemetry metrics. | ||
""" | ||
metrics = { | ||
"dagfactory_version": dagfactory.__version__, | ||
"airflow_version": airflow_version, | ||
"python_version": platform.python_version(), | ||
"platform_system": platform.system(), | ||
"platform_machine": platform.machine(), | ||
"variables": {}, | ||
} | ||
return metrics | ||
|
||
|
||
def emit_usage_metrics(metrics: dict[str, object]) -> bool: | ||
""" | ||
Emit desired telemetry metrics to remote telemetry endpoint. | ||
The metrics must contain the necessary fields to build the TELEMETRY_URL. | ||
""" | ||
query_string = urlencode(metrics) | ||
telemetry_url = constants.TELEMETRY_URL.format( | ||
**metrics, telemetry_version=constants.TELEMETRY_VERSION, query_string=query_string | ||
) | ||
logging.debug("Telemetry is enabled. Emitting the following usage metrics to %s: %s", telemetry_url, metrics) | ||
response = httpx.get(telemetry_url, timeout=constants.TELEMETRY_TIMEOUT) | ||
if not response.is_success: | ||
logging.warning( | ||
"Unable to emit usage metrics to %s. Status code: %s. Message: %s", | ||
telemetry_url, | ||
response.status_code, | ||
response.text, | ||
) | ||
return response.is_success | ||
|
||
|
||
def emit_usage_metrics_if_enabled(event_type: str, additional_metrics: dict[str, object]) -> bool: | ||
""" | ||
Checks if telemetry should be emitted, fetch standard metrics, complement with custom metrics | ||
and emit them to remote telemetry endpoint. | ||
:returns: If the event was successfully sent to the telemetry backend or not. | ||
""" | ||
if should_emit(): | ||
metrics = collect_standard_usage_metrics() | ||
metrics["type"] = event_type | ||
metrics["variables"].update(additional_metrics) | ||
is_success = emit_usage_metrics(metrics) | ||
return is_success | ||
else: | ||
logging.debug("Telemetry is disabled. To enable it, export AIRFLOW__DAG_FACTORY__ENABLE_TELEMETRY=True.") | ||
return False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
import pytest | ||
|
||
from dagfactory import settings | ||
|
||
|
||
@pytest.mark.parametrize( | ||
"value,expected_response", | ||
[ | ||
("f", False), | ||
("false", False), | ||
("0", False), | ||
("", False), | ||
("none", False), | ||
("True", True), | ||
("true", True), | ||
("1", True), | ||
], | ||
) | ||
def test_convert_to_boolean(value, expected_response): | ||
assert settings.convert_to_boolean(value) == expected_response |
Oops, something went wrong.