This custom Airflow provider package allows you to create Airflow tasks that interact with the DeepLynx data warehouse. Utilizing operators specifically designed for DeepLynx, this package enables seamless integration and communication between Airflow and DeepLynx, facilitating data management and processing workflows.
To install the provider package from PyPI, simply run:
pip install airflow-provider-deeplynx
- Clone the repository to your local machine.
- Navigate to the cloned repository directory:
cd airflow-provider-deeplynx
- Install the package using pip:
pip install .
- for development, you can install in editable mode with
pip install -e .
- for development, you can install in editable mode with
This package uses some environment variables for configuration. The environment variables should be set and available where your Airflow instance is running.
SSL_CERT_FILE
: This should be the Airflow accessible path to the file containing the INL SSL certificate authority. This may be needed depending on your DeepLynx instance's setup DeepLynx Authentication Methods.DEEPLYNX_DATA_TEMP_FOLDER
: This is the Airflow environment path for where data is downloaded. If no value is set, this defaults toAIRFLOW_HOME/logs/data
.
Typical communication with DeepLynx requires a bearer token, so the first task of a DeepLynx DAG is usually to generate a token, which can be done with GetOauthTokenOperator
. This token should be passed to downstream tasks using XComs, the token generation task_id
, and the key token
. GetOauthTokenOperator
requires either a conn_id
of an Airflow Connection of type DeepLynx, or the parameters host
, api_key
, and api_secret
. It is recommended to create a new Airflow connection of type DeepLynx through the Airflow UI, and input values for DeepLynx URL
, API Key
, and API Secret
. You can then use this DeepLynx connection's id to set the
conn_id
for any airflow operators in this package (alternatively, you can supply the host
parameter).
Navigate to the Connections page with Admin -> Connections.
Most functionality can be understood by looking at the provided Example Dags. Class level documentation is also provided.
Example DAGs are provided in deeplynx_provider/example_dags
. Copy the full directory into your airflow DAG_FOLDER
to have them loaded into your airflow environment.
A functional test DAG for the airflow-provider-deeplynx
package. Users should create a DeepLynx connection in Airflow with URL
, API Key
,
and API Secret
. To run the DAG, supply the DeepLynx connection_id
, optionally create a new container_name
, and keep data_source_name
as TC-201
.
This DAG will:
- check if the supplied
container_name
exists and retrieve thecontainer_id
if so; if that container name does not exist, it will create a new container with the supplied name. - import container ontology and typemappings from Container_Export.json
- set the data source active (named
TC-201
) - import timeseries data
- query timeseries data using two different methods
- upload the timeseries data result
This DAG shows how you can use the DeepLynxConfigurationOperator
to create a custom configuration for DeepLynx communication. It requires that you already have a DeepLynx container and data source created, and that you input your connection_id
, container_id
, and data_source_id
.
This DAG shows all the package supported ways that you can query for metatypes, relationships, and how to perform a graph query. This example requires users to create a graph in DeepLynx, and then to edit the DAG file itself so that the query bodies, parameters, and properties match your given graph data.
This DAG shows how you can get a DeepLynx token using GetOauthTokenOperator
by directly specifying host
, api_key
, and api_secret
(instead of using conn_id
)
Class documentation is available here. It was generated using pdoc and the command pdoc --output-dir=docs deeplynx_provider
ran from the root of this project.
Communication with DeepLynx using this package can be configured with various options like SSL certificate and local file writing locations. Most of the time, the default DeepLynx config will work just fine, but to learn more continue reading.
The operators in this provider package use the Deep Lynx Python SDK to communicate with DeepLynx. The DeepLynxConfigurationOperator can be used to set your Configuration exactly how you want it, and this configuration is then passed to a task instance XCom so that downstream tasks derived from DeepLynxBaseOperator can use this configuration.
This package is setup to use token authentication with DeepLynx, but other authentication methods are supported by setting the DeepLynx Config.
- If using this Airflow package in a Docker environment to talk to a Dockerized DeepLynx, you should likely set your Deeplynx host/url to
http://host.docker.internal:8090
.