docs

rickyschools · May 5, 2024 · 558f029 · 558f029
1 parent c4c54a0
commit 558f029
Show file tree

Hide file tree

Showing 2 changed files with 80 additions and 2 deletions.
diff --git a/docs/_static/examples.md b/docs/_static/examples.md
@@ -9,14 +9,16 @@ Examples are stratified by increasing levels of complexity.
 - [Moderate, Multi-Method Decoration](examples/medium.md)
   - A moderate example multiple methods decorated with dlt.
 - [Streaming, Append example](examples/streaming_append.md)
-  - An simple example showcasing how to write streaming append dlt pipelines.
-- Streaming, CDC example
+  - A simple example showcasing how to write streaming append dlt pipelines.
+- [Streaming, CDC example](./examples/streaming_cdc.md)
+  - A simple example showcasing how to write streaming apply changes into dlt pipelines. 
 - Complex, Multi-Step example
 
 ::: {toctree}
 :maxdepth: 1
 examples/simple.md
 examples/medium.md
 examples/streaming_append.md
+examples/streaming_cdc.md
 :hidden:
 :::
diff --git a/docs/_static/examples/streaming_cdc.md b/docs/_static/examples/streaming_cdc.md
@@ -0,0 +1,76 @@
+# Streaming Apply Changes Example
+
+This article intends to show how to get started with `dltflow` when authoring DLT streaming apply changes pipelines. In
+this sample, we will be going through the following steps:
+
+- [Code](#code)
+- [Configuration](#configuration)
+- [Workflow Spec](#workflow_spec)
+- [Deployment](#deployment)
+
+:::{include} ./base.md
+:::
+
+### Example Pipeline Code
+
+For this example, we will show a simple example with a queue streaming reader.
+
+- Import a `DLTMetaMixin` from `dltflow.quality` and will tell our sample pipeline to inherit from it.
+- Generate the example data on the fly and put it into a python queue.
+- We will transform it by coercing data types.
+
+You should see that there are no direct calls to `dlt`. This is the beauty and intentional simplicity `dltflow`. It does
+not want to get in your way. Rather, it really wants you to focus on your transformation logic to help keep your code
+simple and easy to share with other team members.
+
+:::{literalinclude} ../../../examples/pipelines/streaming_cdc.py
+:::
+
+## Configuration
+
+Now that we have our example code, we need to write our configuration to tell the `DLTMetaMixin` how wrap our codebase.
+
+Under the hood, `dltflow` uses `pydantic` to create validation for configuration. When working with `dltflow`, it
+requires your configuration to adhere to a specific structure. Namely, file should have the following sections:
+
+- `reader`: This is helpful for telling your pipeline where to read data from.
+- `writer`: Used to define where your data is written to after being processed.
+- `dlt`: Defines how `dlt` will be used in the project. We use this to dynamically wrap your code with `dlt` commands.
+
+With this brief overview out of the way, lets review our configuration for this sample.
+
+:::{literalinclude} ../../../examples/conf/streaming_apply_changes_dlt.yml
+:::
+
+The `dlt` section has the following keys, though this configuration can also be a list of `dlt` configs.
+
+- `func_name`: The name of the function/method we want `dlt` to decorate.
+- `kind`: Tells `dlt` if this query should be materialized as a `table` or `view`
+- `expectation_action`: Tells `dlt` how to handle the expectations. `drop`, `fail`, and `allow` are all supported.
+- `expectations`: These are a list of constraints we want to apply to our data.
+- `is_streaming`: This tells `dltflow` this is a streaming query.
+- `apply_chg_config`: This tells `dltflow` we're in a streaming append and fills out necessary `dlt` params.
+    - `target`: Tells `dltflow` what table data will be written to. This should be a streaming table definition created
+      ahead of time.
+    - `source`: Tells `dltflow` where to read and get data from.
+    - `keys`: The primary key(s) of the dataset.
+    - `sequence_by`: The column(s) to use when ordering the dataset.
+    - `stored_as_scd_type`: Tells `dltflow` how to materialize the table. `1` (default), SCD Type 1, 2 - SCD Type 2.
+
+## Workflow Spec
+
+Now that we've gone through the code and configuration, we need to start defining the workflow that we want to deploy
+to Databricks so that our pipeline can be registered as a DLT Pipeline. This structure largely follows the [Databricks
+Pipeline API]() with the addition of a `tasks` key. This key is used during deployment for transitioning your python
+module into a Notebook that can be deployed as a DLT Pipeline.
+
+:::{literalinclude} ../../../examples/workflows/streaming_append_changes_wrkflw.yml
+:::
+
+## Deployment
+
+We're at the final step of this simple example. The last piece of the puzzle here is that we need to deploy our assets
+to a Databricks workspace. To do so, we'll use the `dltflow` cli.
+
+:::{literalinclude} ../../../examples/deployment/deploy_streaming_apply_pipeline.sh
+:::