Skip to content

Commit

Permalink
Standardizing and enhancing Debezium documentation (#606)
Browse files Browse the repository at this point in the history
  • Loading branch information
slyons authored Nov 18, 2024
1 parent 5626e9e commit 06adeff
Showing 1 changed file with 56 additions and 28 deletions.
84 changes: 56 additions & 28 deletions spiceaidocs/docs/components/data-connectors/debezium.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,41 +31,69 @@ datasets:
## Configuration
### Parameters
- `debezium_transport`: Optional. The message broker transport to use. The default is `kafka`. Possible values:
- `kafka`: Use Kafka as the message broker transport. Spice may support additional transports in the future.
- `debezium_message_format`: Optional. The message format to use. The default is `json`. Possible values:
- `json`: Use JSON as the message format. Spice is expected to support additional message formats in the future, like `arvo`.
- `kafka_bootstrap_servers`: Required. A list of host/port pairs for establishing the initial Kafka cluster connection. The client will use all servers, regardless of the bootstrapping servers specified here. This list only affects the initial hosts used to discover the full server set and should be formatted as `host1:port1,host2:port2,...`.
- `kafka_security_protocol`: Security protocol for Kafka connections. Default: `sasl_ssl`. Options:
- `PLAINTEXT`: Plaintext communication; no encryption or authentication.
- `SSL`: Encrypted communication via TLS; no authentication.
- `SASL_PLAINTEXT`: Plaintext communication; SASL (Simple Authentication and Security Layer) authentication.
- `SASL_SSL`: Encrypted communication via TLS; SASL authentication.
- `kafka_sasl_mechanism`: SASL authentication mechanism. Default: `SCRAM-SHA-512`. Options:
- `PLAIN`: Usernames and passwords transmitted in plaintext.
- `SCRAM-SHA-256`: Salted Challenge Response Authentication Mechanism (SCRAM) using SHA-256 hashing.
- `SCRAM-SHA-512`: Salted Challenge Response Authentication Mechanism (SCRAM) using SHA-512 hashing.
- `kafka_sasl_username`: SASL username.
- `kafka_sasl_password`: SASL password.
- `kafka_ssl_ca_location`: Path to the SSL/TLS CA certificate file for server verification.
- `kafka_enable_ssl_certificate_verification`: Enable SSL/TLS certificate verification. Default: `true`.
### `from`

The `from` field takes the form of `debezium:kafka_topic` where `kafka_topic` is the name of the Kafka topic where Debezium is notifying consumers about any upstream changes. In the example above it would listen to the `my_kafka_topic_with_debezium_changes` topic.

### `name`

The dataset name. This will be used as the table name within Spice.

```yaml
datasets:
- from: debezium:my_kafka_topic_with_debezium_changes
name: cool_dataset
```

```sql
SELECT COUNT(*) FROM cool_dataset;
```

```shell
+----------+
| count(*) |
+----------+
| 6001215 |
+----------+
```

### `params`

| Parameter Name | Description |
| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `debezium_transport` | Optional. The message broker transport to use. The default is `kafka`. Possible values:<ul><li>`kafka`: Use Kafka as the message broker transport. Spice may support additional transports in the future.</li></ul> |
| `debezium_message_format` | Optional. The message format to use. The default is `json`. Possible values: <ul><li>`json`: Use JSON as the message format. Spice is expected to support additional message formats in the future, like `avro`.</li></ul> |
| `kafka_bootstrap_servers` | **Required**. A list of host/port pairs for establishing the initial Kafka cluster connection. The client will use all servers, regardless of the bootstrapping servers specified here. This list only affects the initial hosts used to discover the full server set and should be formatted as `host1:port1,host2:port2,...`. |
| `kafka_security_protocol` | Security protocol for Kafka connections. Default: `SASL_SSL`. Options: <ul><li>`PLAINTEXT`</li><li>`SSL`</li><li>`SASL_PLAINTEXT`</li><li>`SASL_SSL`</li></ul> |
| `kafka_sasl_mechanism` | SASL (Simple Authentication and Security Layer) authentication mechanism. Default: `SCRAM-SHA-512`. Options: <ul><li>`PLAIN`</li><li>`SCRAM-SHA-256`</li><li>`SCRAM-SHA-512`</li></ul> |
| `kafka_sasl_username` | SASL username. |
| `kafka_sasl_password` | SASL password. |
| `kafka_ssl_ca_location` | Path to the SSL/TLS CA certificate file for server verification. |
| `kafka_enable_ssl_certificate_verification` | Enable SSL/TLS certificate verification. Default: `true`. |

### Acceleration Settings

Using the Debezium connector requires acceleration to be enabled. The following settings are required:
:::warning

Using the Debezium connector **requires** [acceleration](/components/data-accelerators/index.md) to be enabled.

- `enabled`: Required. Must be set to `true` to enable acceleration.
- `engine`: Required. The acceleration engine to use. Possible valid values:
- `duckdb`: Use [DuckDB](/components/data-accelerators/duckdb.md) as the acceleration engine.
- `sqlite`: Use [SQLite](/components/data-accelerators/sqlite.md) as the acceleration engine.
- `postgres`: Use [PostgreSQL](/components/data-accelerators/postgres/index.md) as the acceleration engine.
- `refresh_mode`: Optional. The refresh mode to use. If specified, this must be set to `changes`. Any other value is an error.
- `mode`: Optional. The persistence mode to use. When using the `duckdb` and `sqlite` engines, it is recommended to set this to `file` to persist the data across restarts. Spice also persists metadata about the dataset, so it can resume from the last known state of the dataset instead of re-fetching the entire dataset.
:::

The following settings are required:

| Parameter Name | Description |
| -------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled` | Required. Must be set to `true` to enable acceleration. |
| `engine` | Required. The acceleration engine to use. Possible valid values: <ul><li>`duckdb`: Use [DuckDB](/components/data-accelerators/duckdb.md) as the acceleration engine.</li><li>`sqlite`: Use [SQLite](/components/data-accelerators/sqlite.md) as the acceleration engine.</li><li>`postgres`: Use [PostgreSQL](/components/data-accelerators/postgres/index.md) as the acceleration engine.</li></ul> |
| `refresh_mode` | Optional. The refresh mode to use. If specified, this must be set to `changes`. Any other value is an error. |
| `mode` | Optional. The persistence mode to use. When using the `duckdb` and `sqlite` engines, it is recommended to set this to `file` to persist the data across restarts. Spice also persists metadata about the dataset, so it can resume from the last known state of the dataset instead of re-fetching the entire dataset. |

### Example

See an example of configuring a dataset to use CDC with Debezium by following the sample [Streaming changes in real-time with Debezium CDC](https://github.com/spiceai/samples/tree/trunk/cdc-debezium).

An example of configuring [SASL authentication over SSL](https://github.com/spiceai/samples/tree/trunk/cdc-debezium/sasl-scram) is available as well.

## Secrets

Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).

0 comments on commit 06adeff

Please sign in to comment.