From 06adeff16523ccdf84541d53758b3017ffe23961 Mon Sep 17 00:00:00 2001 From: Scott Lyons Date: Mon, 18 Nov 2024 13:58:31 -0800 Subject: [PATCH] Standardizing and enhancing Debezium documentation (#606) --- .../components/data-connectors/debezium.md | 84 ++++++++++++------- 1 file changed, 56 insertions(+), 28 deletions(-) diff --git a/spiceaidocs/docs/components/data-connectors/debezium.md b/spiceaidocs/docs/components/data-connectors/debezium.md index 2fde4b5a..b25c67c0 100644 --- a/spiceaidocs/docs/components/data-connectors/debezium.md +++ b/spiceaidocs/docs/components/data-connectors/debezium.md @@ -31,41 +31,69 @@ datasets: ## Configuration -### Parameters - -- `debezium_transport`: Optional. The message broker transport to use. The default is `kafka`. Possible values: - - `kafka`: Use Kafka as the message broker transport. Spice may support additional transports in the future. -- `debezium_message_format`: Optional. The message format to use. The default is `json`. Possible values: - - `json`: Use JSON as the message format. Spice is expected to support additional message formats in the future, like `arvo`. -- `kafka_bootstrap_servers`: Required. A list of host/port pairs for establishing the initial Kafka cluster connection. The client will use all servers, regardless of the bootstrapping servers specified here. This list only affects the initial hosts used to discover the full server set and should be formatted as `host1:port1,host2:port2,...`. -- `kafka_security_protocol`: Security protocol for Kafka connections. Default: `sasl_ssl`. Options: - - `PLAINTEXT`: Plaintext communication; no encryption or authentication. - - `SSL`: Encrypted communication via TLS; no authentication. - - `SASL_PLAINTEXT`: Plaintext communication; SASL (Simple Authentication and Security Layer) authentication. - - `SASL_SSL`: Encrypted communication via TLS; SASL authentication. -- `kafka_sasl_mechanism`: SASL authentication mechanism. Default: `SCRAM-SHA-512`. Options: - - `PLAIN`: Usernames and passwords transmitted in plaintext. - - `SCRAM-SHA-256`: Salted Challenge Response Authentication Mechanism (SCRAM) using SHA-256 hashing. - - `SCRAM-SHA-512`: Salted Challenge Response Authentication Mechanism (SCRAM) using SHA-512 hashing. -- `kafka_sasl_username`: SASL username. -- `kafka_sasl_password`: SASL password. -- `kafka_ssl_ca_location`: Path to the SSL/TLS CA certificate file for server verification. -- `kafka_enable_ssl_certificate_verification`: Enable SSL/TLS certificate verification. Default: `true`. +### `from` + +The `from` field takes the form of `debezium:kafka_topic` where `kafka_topic` is the name of the Kafka topic where Debezium is notifying consumers about any upstream changes. In the example above it would listen to the `my_kafka_topic_with_debezium_changes` topic. + +### `name` + +The dataset name. This will be used as the table name within Spice. + +```yaml +datasets: + - from: debezium:my_kafka_topic_with_debezium_changes + name: cool_dataset +``` + +```sql +SELECT COUNT(*) FROM cool_dataset; +``` + +```shell ++----------+ +| count(*) | ++----------+ +| 6001215 | ++----------+ +``` + +### `params` + +| Parameter Name | Description | +| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `debezium_transport` | Optional. The message broker transport to use. The default is `kafka`. Possible values: | +| `debezium_message_format` | Optional. The message format to use. The default is `json`. Possible values: | +| `kafka_bootstrap_servers` | **Required**. A list of host/port pairs for establishing the initial Kafka cluster connection. The client will use all servers, regardless of the bootstrapping servers specified here. This list only affects the initial hosts used to discover the full server set and should be formatted as `host1:port1,host2:port2,...`. | +| `kafka_security_protocol` | Security protocol for Kafka connections. Default: `SASL_SSL`. Options: | +| `kafka_sasl_mechanism` | SASL (Simple Authentication and Security Layer) authentication mechanism. Default: `SCRAM-SHA-512`. Options: | +| `kafka_sasl_username` | SASL username. | +| `kafka_sasl_password` | SASL password. | +| `kafka_ssl_ca_location` | Path to the SSL/TLS CA certificate file for server verification. | +| `kafka_enable_ssl_certificate_verification` | Enable SSL/TLS certificate verification. Default: `true`. | ### Acceleration Settings -Using the Debezium connector requires acceleration to be enabled. The following settings are required: +:::warning + +Using the Debezium connector **requires** [acceleration](/components/data-accelerators/index.md) to be enabled. -- `enabled`: Required. Must be set to `true` to enable acceleration. -- `engine`: Required. The acceleration engine to use. Possible valid values: - - `duckdb`: Use [DuckDB](/components/data-accelerators/duckdb.md) as the acceleration engine. - - `sqlite`: Use [SQLite](/components/data-accelerators/sqlite.md) as the acceleration engine. - - `postgres`: Use [PostgreSQL](/components/data-accelerators/postgres/index.md) as the acceleration engine. -- `refresh_mode`: Optional. The refresh mode to use. If specified, this must be set to `changes`. Any other value is an error. -- `mode`: Optional. The persistence mode to use. When using the `duckdb` and `sqlite` engines, it is recommended to set this to `file` to persist the data across restarts. Spice also persists metadata about the dataset, so it can resume from the last known state of the dataset instead of re-fetching the entire dataset. +::: + +The following settings are required: + +| Parameter Name | Description | +| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `enabled` | Required. Must be set to `true` to enable acceleration. | +| `engine` | Required. The acceleration engine to use. Possible valid values: | +| `refresh_mode` | Optional. The refresh mode to use. If specified, this must be set to `changes`. Any other value is an error. | +| `mode` | Optional. The persistence mode to use. When using the `duckdb` and `sqlite` engines, it is recommended to set this to `file` to persist the data across restarts. Spice also persists metadata about the dataset, so it can resume from the last known state of the dataset instead of re-fetching the entire dataset. | ### Example See an example of configuring a dataset to use CDC with Debezium by following the sample [Streaming changes in real-time with Debezium CDC](https://github.com/spiceai/samples/tree/trunk/cdc-debezium). An example of configuring [SASL authentication over SSL](https://github.com/spiceai/samples/tree/trunk/cdc-debezium/sasl-scram) is available as well. + +## Secrets + +Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets). \ No newline at end of file