diff --git a/spiceaidocs/docs/components/data-accelerators/arrow.md b/spiceaidocs/docs/components/data-accelerators/arrow.md
index 0d96ecd0..6c9fdecc 100644
--- a/spiceaidocs/docs/components/data-accelerators/arrow.md
+++ b/spiceaidocs/docs/components/data-accelerators/arrow.md
@@ -47,3 +47,7 @@ When accelerating a dataset using the In-Memory Arrow Data Accelerator, some or
In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by [`duckdb`](./duckdb.md) and [`sqlite`](./sqlite.md) accelerators by specifying `mode: file`.
:::
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure In-Memory Arrow as data accelerator in Spice. [Arrow Accelerator quickstart](https://github.com/spiceai/quickstarts/tree/trunk/arrow)
diff --git a/spiceaidocs/docs/components/data-accelerators/data-refresh.md b/spiceaidocs/docs/components/data-accelerators/data-refresh.md
index cb6d6104..211f9028 100644
--- a/spiceaidocs/docs/components/data-accelerators/data-refresh.md
+++ b/spiceaidocs/docs/components/data-accelerators/data-refresh.md
@@ -85,11 +85,11 @@ Typically only a working subset of an entire dataset is used in an application o
### Refresh SQL
-| | |
-| --------------------------- | --------- |
-| Supported in `refresh_mode` | Any |
-| Required | No |
-| Default Value | Unset |
+| | |
+| --------------------------- | ----- |
+| Supported in `refresh_mode` | Any |
+| Required | No |
+| Default Value | Unset |
Refresh SQL supports specifying filters for data accelerated from the connected source using arbitrary SQL.
@@ -158,7 +158,7 @@ In this example, `refresh_data_window` is converted into an effective Refresh SQ
This parameter relies on the `time_column` dataset parameter specifying a column that is a timestamp type. Optionally, the `time_format` can be specified to instruct the Spice runtime on how to interpret timestamps in the `time_column`.
-*Example with `refresh_sql`:*
+_Example with `refresh_sql`:_
```yaml
datasets:
@@ -176,7 +176,7 @@ datasets:
This example will only accelerate data from the federated source that matches the filter `city = 'Seattle'` and is less than 1 day old.
-*Example with `on_zero_results`:*
+_Example with `on_zero_results`:_
```yaml
datasets:
@@ -446,7 +446,13 @@ This acceleration configuration applies a number of different behaviors:
1. A `refresh_data_window` was specified. When Spice starts, it will apply this `refresh_data_window` to the `refresh_sql`, and retrieve only the last day's worth of logs with an `asset = 'asset_id'`.
2. Because a `refresh_sql` is specified, every refresh (including initial load) will have the filter applied to the refresh query.
3. 10 minutes after loading, as specified by the `refresh_check_interval`, the first refresh will occur - retrieving new rows where `asset = 'asset_id'`.
-4. Running a query to retrieve logs with an `asset` that is *not* `asset_id` will fall back to the source, because of the `on_zero_results: use_source` parameter.
+4. Running a query to retrieve logs with an `asset` that is _not_ `asset_id` will fall back to the source, because of the `on_zero_results: use_source` parameter.
5. Running a query to retrieve a log longer than 1 day ago will fall back to the source, because of the `on_zero_results: use_source` parameter.
6. Running a query to retrieve logs within a range of now to longer than 1 day ago will only return logs from the last day. This is due to the `refresh_data_window` only accelerating the last day's worth of logs, which will return some results. Because results are returned, Spice will not fall back to the source even though `on_zero_results: use_source` is specified.
7. Spice will retain newly appended log rows for 7 days before discarding them, as specified by the `retention_*` parameters.
+
+## Quickstarts and Samples
+
+- Configure accelerated dataset retention policy. [Accelerated Dataset Retention Policy Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/retention/README.md)
+- Dynamically refresh specific data at runtime by programmatically updating refresh_sql and triggering data refreshes. [Advanced Data Refresh Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/acceleration/data-refresh/README.md)
+- Configure `refresh_data_window` to filter refreshed data to recent data [Refresh Data Window Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/refresh-data-window/README.md)
diff --git a/spiceaidocs/docs/components/data-accelerators/duckdb.md b/spiceaidocs/docs/components/data-accelerators/duckdb.md
index db11e1b8..b8ba4354 100644
--- a/spiceaidocs/docs/components/data-accelerators/duckdb.md
+++ b/spiceaidocs/docs/components/data-accelerators/duckdb.md
@@ -57,3 +57,7 @@ When accelerating a dataset using `mode: memory` (the default), some or all of t
In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by [`duckdb`](./duckdb.md) and [`sqlite`](./sqlite.md) accelerators by specifying `mode: file`.
:::
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure DuckDB as a data accelerator in Spice. [DuckDB Accelerator quickstart](https://github.com/spiceai/quickstarts/tree/trunk/duckdb/accelerator)
diff --git a/spiceaidocs/docs/components/data-accelerators/postgres/index.md b/spiceaidocs/docs/components/data-accelerators/postgres/index.md
index e1a33031..9f96231a 100644
--- a/spiceaidocs/docs/components/data-accelerators/postgres/index.md
+++ b/spiceaidocs/docs/components/data-accelerators/postgres/index.md
@@ -110,3 +110,7 @@ The table below lists the supported [Apache Arrow data types](https://arrow.apac
| `Duration` | `BigInteger` | `bigint` |
| `List` / `LargeList` / `FixedSizeList` | `Array` | `array` |
| `Struct` | `N/A` | `Composite` (Custom type) |
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure PostgreSQL as a data accelerator in Spice. [PostgreSQL Accelerator quickstart](https://github.com/spiceai/quickstarts/tree/trunk/postgres/accelerator)
diff --git a/spiceaidocs/docs/components/data-accelerators/sqlite.md b/spiceaidocs/docs/components/data-accelerators/sqlite.md
index f9206b02..6cda72c4 100644
--- a/spiceaidocs/docs/components/data-accelerators/sqlite.md
+++ b/spiceaidocs/docs/components/data-accelerators/sqlite.md
@@ -42,7 +42,7 @@ datasets:
- The SQLite accelerator only supports arrow `List` types of primitive data types; lists with structs are not supported.
- The SQLite accelerator doesn't support advanced grouping features such as `ROLLUP` and `GROUPING`.
- In SQLite, `CAST(value AS DECIMAL)` doesn't convert an integer to a floating-point value if the casted value is an integer. Operations like `CAST(1 AS DECIMAL) / CAST(2 AS DECIMAL)` will be treated as integer division, resulting in 0 instead of the expected 0.5.
-Use `FLOAT` to ensure conversion to a floating-point value: `CAST(1 AS FLOAT) / CAST(2 AS FLOAT)`.
+ Use `FLOAT` to ensure conversion to a floating-point value: `CAST(1 AS FLOAT) / CAST(2 AS FLOAT)`.
- Updating a dataset with SQLite acceleration while the Spice Runtime is running (hot-reload) will cause SQLite accelerator query federation to disable until the Runtime is restarted.
:::
@@ -54,3 +54,7 @@ When accelerating a dataset using `mode: memory` (the default), some or all of t
In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by [`duckdb`](./duckdb.md) and [`sqlite`](./sqlite.md) accelerators by specifying `mode: file`.
:::
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure SQLite as a data accelerator in Spice. [SQLite Accelerator quickstart](https://github.com/spiceai/quickstarts/tree/trunk/sqlite/accelerator)
diff --git a/spiceaidocs/docs/components/data-connectors/abfs.md b/spiceaidocs/docs/components/data-connectors/abfs.md
index 5f95297e..412f5968 100644
--- a/spiceaidocs/docs/components/data-connectors/abfs.md
+++ b/spiceaidocs/docs/components/data-connectors/abfs.md
@@ -4,7 +4,7 @@ sidebar_label: 'Azure BlobFS Data Connector'
description: 'Azure BlobFS Data Connector Documentation'
---
-The Azure BlobFS (ABFS) Data Connector enables federated/accelerated SQL queries on files stored in Azure Blob-compatible endpoints. This includes Azure BlobFS (`abfss://`) and Azure Data Lake (`adl://`) endpoints.
+The Azure BlobFS (ABFS) Data Connector enables federated SQL queries on files stored in Azure Blob-compatible endpoints. This includes Azure BlobFS (`abfss://`) and Azure Data Lake (`adl://`) endpoints.
When a folder path is provided, all the contained files will be loaded.
@@ -58,20 +58,20 @@ SELECT COUNT(*) FROM cool_dataset;
#### Basic parameters
-| Parameter name | Description |
-| --------------------------- | ------------------------------------------------------------------------------------------------ |
-| `file_format` | Specifies the data format. Required if not inferrable from `from`. Options: `parquet`, `csv`. |
-| `abfs_account` | Azure storage account name |
-| `abfs_sas_string` | SAS (Shared Access Signature) Token to use for authorization |
-| `abfs_endpoint` | Storage endpoint, default: `https://{account}.blob.core.windows.net` |
-| `abfs_use_emulator` | Use `true` or `false` to connect to a local emulator |
-| `abfs_allow_http` | Allow insecure HTTP connections |
-| `abfs_authority_host` | Alternative authority host, default: `https://login.microsoftonline.com` |
-| `abfs_proxy_url` | Proxy URL |
-| `abfs_proxy_ca_certificate` | CA certificate for the proxy |
-| `abfs_proxy_exludes` | A list of hosts to exclude from proxy connections |
-| `abfs_disable_tagging` | Disable tagging objects. Use this if your backing store doesn't support tags |
-| `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` |
+| Parameter name | Description |
+| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `file_format` | Specifies the data format. Required if not inferrable from `from`. Options: `parquet`, `csv`. Refer to [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats) for details. |
+| `abfs_account` | Azure storage account name |
+| `abfs_sas_string` | SAS (Shared Access Signature) Token to use for authorization |
+| `abfs_endpoint` | Storage endpoint, default: `https://{account}.blob.core.windows.net` |
+| `abfs_use_emulator` | Use `true` or `false` to connect to a local emulator |
+| `abfs_allow_http` | Allow insecure HTTP connections |
+| `abfs_authority_host` | Alternative authority host, default: `https://login.microsoftonline.com` |
+| `abfs_proxy_url` | Proxy URL |
+| `abfs_proxy_ca_certificate` | CA certificate for the proxy |
+| `abfs_proxy_exludes` | A list of hosts to exclude from proxy connections |
+| `abfs_disable_tagging` | Disable tagging objects. Use this if your backing store doesn't support tags |
+| `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` |
#### Authentication parameters
diff --git a/spiceaidocs/docs/components/data-connectors/clickhouse.md b/spiceaidocs/docs/components/data-connectors/clickhouse.md
index 371ee906..96eb10a9 100644
--- a/spiceaidocs/docs/components/data-connectors/clickhouse.md
+++ b/spiceaidocs/docs/components/data-connectors/clickhouse.md
@@ -1,7 +1,7 @@
---
-title: 'Clickhouse Data Connector'
-sidebar_label: 'Clickhouse Data Connector'
-description: 'Clickhouse Data Connector Documentation'
+title: 'ClickHouse Data Connector'
+sidebar_label: 'ClickHouse Data Connector'
+description: 'ClickHouse Data Connector Documentation'
---
ClickHouse is a fast, open-source columnar database management system designed for online analytical processing (OLAP) and real-time analytics. This connector enables federated SQL queries from a ClickHouse server.
@@ -46,16 +46,16 @@ SELECT COUNT(*) FROM cool_dataset;
The ClickHouse data connector can be configured by providing the following `params`:
-| Parameter Name | Definition |
-| ------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `clickhouse_connection_string` | The connection string to use to connect to the ClickHouse server. This can be used instead of providing individual connection parameters. |
-| `clickhouse_host` | The hostname of the ClickHouse server. |
-| `clickhouse_tcp_port` | The port of the ClickHouse server. |
-| `clickhouse_db` | The name of the database to connect to. |
-| `clickhouse_user` | The username to connect with. |
-| `clickhouse_pass` | The password to connect with. |
+| Parameter Name | Definition |
+| ------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `clickhouse_connection_string` | The connection string to use to connect to the ClickHouse server. This can be used instead of providing individual connection parameters. |
+| `clickhouse_host` | The hostname of the ClickHouse server. |
+| `clickhouse_tcp_port` | The port of the ClickHouse server. |
+| `clickhouse_db` | The name of the database to connect to. |
+| `clickhouse_user` | The username to connect with. |
+| `clickhouse_pass` | The password to connect with. |
| `clickhouse_secure` | Optional. Specifies the SSL/TLS behavior for the connection, supported values:
- `true`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.
- `false`: This mode will not attempt to use an SSL connection, even if the server supports it.
|
-| `connection_timeout` | Optional. Specifies the connection timeout in milliseconds. |
+| `connection_timeout` | Optional. Specifies the connection timeout in milliseconds. |
## Examples
@@ -100,3 +100,7 @@ datasets:
## Secrets
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure ClickHouse as data connector in Spice. [ClickHouse Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/clickhouse)
diff --git a/spiceaidocs/docs/components/data-connectors/databricks.md b/spiceaidocs/docs/components/data-connectors/databricks.md
index b6a82736..b3f01b74 100644
--- a/spiceaidocs/docs/components/data-connectors/databricks.md
+++ b/spiceaidocs/docs/components/data-connectors/databricks.md
@@ -191,3 +191,7 @@ The table below shows the Databricks (mode: delta_lake) data types supported, al
## Secrets
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Databricks as data connector in Spice under `delta_lake` mode. [Databricks Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/databricks)
diff --git a/spiceaidocs/docs/components/data-connectors/debezium.md b/spiceaidocs/docs/components/data-connectors/debezium.md
index 141582f6..02a12326 100644
--- a/spiceaidocs/docs/components/data-connectors/debezium.md
+++ b/spiceaidocs/docs/components/data-connectors/debezium.md
@@ -60,18 +60,18 @@ SELECT COUNT(*) FROM cool_dataset;
### `params`
-| Parameter Name | Description |
-| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `debezium_transport` | Optional. The message broker transport to use. The default is `kafka`. Possible values:- `kafka`: Use Kafka as the message broker transport. Spice may support additional transports in the future.
|
-| `debezium_message_format` | Optional. The message format to use. The default is `json`. Possible values: - `json`: Use JSON as the message format. Spice is expected to support additional message formats in the future, like `avro`.
|
-| `kafka_bootstrap_servers` | **Required**. A list of host/port pairs for establishing the initial Kafka cluster connection. The client will use all servers, regardless of the bootstrapping servers specified here. This list only affects the initial hosts used to discover the full server set and should be formatted as `host1:port1,host2:port2,...`. |
-| `kafka_security_protocol` | Security protocol for Kafka connections. Default: `SASL_SSL`. Options: - `PLAINTEXT`
- `SSL`
- `SASL_PLAINTEXT`
- `SASL_SSL`
|
-| `kafka_sasl_mechanism` | SASL (Simple Authentication and Security Layer) authentication mechanism. Default: `SCRAM-SHA-512`. Options: - `PLAIN`
- `SCRAM-SHA-256`
- `SCRAM-SHA-512`
|
-| `kafka_sasl_username` | SASL username. |
-| `kafka_sasl_password` | SASL password. |
-| `kafka_ssl_ca_location` | Path to the SSL/TLS CA certificate file for server verification. |
-| `kafka_enable_ssl_certificate_verification` | Enable SSL/TLS certificate verification. Default: `true`. |
-| `kafka_ssl_endpoint_identification_algorithm` | SSL/TLS endpoint identification algorithm. Default: `https`. Options: |
+| Parameter Name | Description |
+| --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `debezium_transport` | Optional. The message broker transport to use. The default is `kafka`. Possible values:- `kafka`: Use Kafka as the message broker transport. Spice may support additional transports in the future.
|
+| `debezium_message_format` | Optional. The message format to use. The default is `json`. Possible values: - `json`: Use JSON as the message format. Spice is expected to support additional message formats in the future, like `avro`.
|
+| `kafka_bootstrap_servers` | **Required**. A list of host/port pairs for establishing the initial Kafka cluster connection. The client will use all servers, regardless of the bootstrapping servers specified here. This list only affects the initial hosts used to discover the full server set and should be formatted as `host1:port1,host2:port2,...`. |
+| `kafka_security_protocol` | Security protocol for Kafka connections. Default: `SASL_SSL`. Options: - `PLAINTEXT`
- `SSL`
- `SASL_PLAINTEXT`
- `SASL_SSL`
|
+| `kafka_sasl_mechanism` | SASL (Simple Authentication and Security Layer) authentication mechanism. Default: `SCRAM-SHA-512`. Options: - `PLAIN`
- `SCRAM-SHA-256`
- `SCRAM-SHA-512`
|
+| `kafka_sasl_username` | SASL username. |
+| `kafka_sasl_password` | SASL password. |
+| `kafka_ssl_ca_location` | Path to the SSL/TLS CA certificate file for server verification. |
+| `kafka_enable_ssl_certificate_verification` | Enable SSL/TLS certificate verification. Default: `true`. |
+| `kafka_ssl_endpoint_identification_algorithm` | SSL/TLS endpoint identification algorithm. Default: `https`. Options: |
### Acceleration Settings
@@ -90,12 +90,12 @@ The following settings are required:
| `refresh_mode` | Optional. The refresh mode to use. If specified, this must be set to `changes`. Any other value is an error. |
| `mode` | Optional. The persistence mode to use. When using the `duckdb` and `sqlite` engines, it is recommended to set this to `file` to persist the data across restarts. Spice also persists metadata about the dataset, so it can resume from the last known state of the dataset instead of re-fetching the entire dataset. |
-### Example
+## Secrets
-See an example of configuring a dataset to use CDC with Debezium by following the sample [Streaming changes in real-time with Debezium CDC](https://github.com/spiceai/samples/tree/trunk/cdc-debezium).
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
-An example of configuring [SASL authentication over SSL](https://github.com/spiceai/samples/tree/trunk/cdc-debezium/sasl-scram) is available as well.
+## Quickstarts and Samples
-## Secrets
+- See an example of configuring a dataset to use CDC with Debezium by following the sample [Streaming changes in real-time with Debezium CDC](https://github.com/spiceai/samples/tree/trunk/cdc-debezium).
-Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
\ No newline at end of file
+- An example of configuring [SASL authentication over SSL](https://github.com/spiceai/samples/tree/trunk/cdc-debezium/sasl-scram) is available as well.
diff --git a/spiceaidocs/docs/components/data-connectors/delta-lake.md b/spiceaidocs/docs/components/data-connectors/delta-lake.md
index 9e1efe13..fa88bcbc 100644
--- a/spiceaidocs/docs/components/data-connectors/delta-lake.md
+++ b/spiceaidocs/docs/components/data-connectors/delta-lake.md
@@ -5,7 +5,7 @@ description: 'Delta Lake Data Connector Documentation'
pagination_prev: null
---
-Query/accelerate [Delta Lake](https://delta.io/) tables in Spice.
+Delta Lake data connector connector enables SQL queries from [Delta Lake](https://delta.io/) tables.
```yaml
datasets:
@@ -27,12 +27,12 @@ The `from` field for the Delta Lake connector takes the form of `delta_lake:path
The dataset name. This will be used as the table name within Spice.
Example:
+
```yaml
datasets:
- from: delta_lake:s3://my_bucket/path/to/s3/delta/table/
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -71,10 +71,10 @@ Use the [secret replacement syntax](../secret-stores/index.md) to reference a se
:::info Note
**One** of the following auth values must be provided for Azure Blob:
-- `delta_lake_azure_storage_account_key`,
-- `delta_lake_azure_storage_client_id` and `azure_storage_client_secret`, or
+- `delta_lake_azure_storage_account_key`,
+- `delta_lake_azure_storage_client_id` and `azure_storage_client_secret`, or
- `delta_lake_azure_storage_sas_key`.
-:::
+ :::
| Parameter Name | Description |
| ---------------------------------------- | ---------------------------------------------------------------------- |
@@ -96,47 +96,47 @@ Use the [secret replacement syntax](../secret-stores/index.md) to reference a se
### Delta Lake + Local
```yaml
- - from: delta_lake:/path/to/local/delta/table # A local filesystem path to a Delta Lake table
- name: my_delta_lake_table
+- from: delta_lake:/path/to/local/delta/table # A local filesystem path to a Delta Lake table
+ name: my_delta_lake_table
```
### Delta Lake + S3
```yaml
- - from: delta_lake:s3://my_bucket/path/to/s3/delta/table/ # A reference to a table in S3
- name: my_delta_lake_table
- params:
- delta_lake_aws_region: us-west-2 # Optional
- delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
- delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}
- delta_lake_aws_endpoint: s3.us-west-2.amazonaws.com # Optional
+- from: delta_lake:s3://my_bucket/path/to/s3/delta/table/ # A reference to a table in S3
+ name: my_delta_lake_table
+ params:
+ delta_lake_aws_region: us-west-2 # Optional
+ delta_lake_aws_access_key_id: ${secrets:aws_access_key_id}
+ delta_lake_aws_secret_access_key: ${secrets:aws_secret_access_key}
+ delta_lake_aws_endpoint: s3.us-west-2.amazonaws.com # Optional
```
### Delta Lake + Azure Blob
```yaml
- - from: delta_lake:abfss://my_container@my_account.dfs.core.windows.net/path/to/azure/delta/table/ # A reference to a table in Azure Blob
- name: my_delta_lake_table
- params:
- # Account Name + Key
- delta_lake_azure_storage_account_name: my_account
- delta_lake_azure_storage_account_key: ${secrets:my_key}
-
- # OR Service Principal + Secret
- delta_lake_azure_storage_client_id: my_client_id
- delta_lake_azure_storage_client_secret: ${secrets:my_secret}
-
- # OR SAS Key
- delta_lake_azure_storage_sas_key: my_sas_key
+- from: delta_lake:abfss://my_container@my_account.dfs.core.windows.net/path/to/azure/delta/table/ # A reference to a table in Azure Blob
+ name: my_delta_lake_table
+ params:
+ # Account Name + Key
+ delta_lake_azure_storage_account_name: my_account
+ delta_lake_azure_storage_account_key: ${secrets:my_key}
+
+ # OR Service Principal + Secret
+ delta_lake_azure_storage_client_id: my_client_id
+ delta_lake_azure_storage_client_secret: ${secrets:my_secret}
+
+ # OR SAS Key
+ delta_lake_azure_storage_sas_key: my_sas_key
```
### Delta Lake + Google Storage
```yaml
- params:
- delta_lake_google_service_account_path: /path/to/service-account.json
+params:
+ delta_lake_google_service_account_path: /path/to/service-account.json
```
## Secrets
-Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
\ No newline at end of file
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
diff --git a/spiceaidocs/docs/components/data-connectors/dremio.md b/spiceaidocs/docs/components/data-connectors/dremio.md
index 5a00569d..b80daf7e 100644
--- a/spiceaidocs/docs/components/data-connectors/dremio.md
+++ b/spiceaidocs/docs/components/data-connectors/dremio.md
@@ -4,9 +4,9 @@ sidebar_label: 'Dremio Data Connector'
description: 'Dremio Data Connector Documentation'
---
-[Dremio](https://www.dremio.com/) is a data lake engine that enables high-performance SQL queries directly on data lake storage. It provides a unified interface for querying and analyzing data from various sources without the need for complex data movement or transformation.
+[Dremio](https://www.dremio.com/) is a data lake engine that enables high-performance SQL queries directly on data lake storage. It provides a unified interface for querying and analyzing data from various sources without the need for complex data movement or transformation.
-This connector enables using Dremio as a data source for federated/accelerated SQL queries.
+This connector enables using Dremio as a data source for federated SQL queries.
```yaml
- from: dremio:datasets.dremio_dataset
@@ -34,12 +34,12 @@ Currently, only up to three levels of nesting are supported for dataset names (e
The dataset name. This will be used as the table name within Spice.
Example:
+
```yaml
datasets:
- from: dremio:datasets.dremio_dataset
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -56,11 +56,11 @@ SELECT COUNT(*) FROM cool_dataset;
### `params`
-| Parameter Name | Description |
-| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `dremio_endpoint` | The endpoint used to connect to the Dremio server. |
-| `dremio_username` | The username to connect with. |
-| `dremio_password` | The password to connect with. Use the [secret replacement syntax](#secrets) to load the password from a secret store, e.g. `${secrets:my_dremio_pass}`. |
+| Parameter Name | Description |
+| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `dremio_endpoint` | The endpoint used to connect to the Dremio server. |
+| `dremio_username` | The username used to connect to the Dremio endpoint. |
+| `dremio_password` | The password used to connect to the Dremio endpoint. Use the [secret replacement syntax](#secrets) to load the password from a secret store, e.g. `${secrets:my_dremio_pass}`. |
## Examples
@@ -78,3 +78,7 @@ SELECT COUNT(*) FROM cool_dataset;
## Secrets
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Dremio as data connector in Spice. [Dremio Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/dremio)
diff --git a/spiceaidocs/docs/components/data-connectors/duckdb.md b/spiceaidocs/docs/components/data-connectors/duckdb.md
index 46127d93..c4934f06 100644
--- a/spiceaidocs/docs/components/data-connectors/duckdb.md
+++ b/spiceaidocs/docs/components/data-connectors/duckdb.md
@@ -22,9 +22,9 @@ datasets:
The `from` field supports one of two forms:
-| `from` | Description |
-| ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `duckdb:database.schema.table` | Read data from a table named `database.schema.table` in the DuckDB file |
+| `from` | Description |
+| ------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `duckdb:database.schema.table` | Read data from a table named `database.schema.table` in the DuckDB file |
| `duckdb:*` | Read data using any DuckDB function that produces a table. For example one of the [data import](https://duckdb.org/docs/data/overview) functions such as `read_json`, `read_parquet` or `read_csv`. |
### `name`
@@ -32,12 +32,12 @@ The `from` field supports one of two forms:
The dataset name. This will be used as the table name within Spice.
Example:
+
```yaml
datasets:
- from: duckdb:database.schema.table
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -56,8 +56,8 @@ SELECT COUNT(*) FROM cool_dataset;
The DuckDB data connector can be configured by providing the following `params`:
-| Parameter Name | Description |
-| -------------- | -------------------------------------------------- |
+| Parameter Name | Description |
+| -------------- | ---------------------------------------- |
| `duckdb_open` | The name of the DuckDB database to open. |
Configuration `params` are provided either in the top level `dataset` for a dataset source, or in the `acceleration` section for a data store.
@@ -138,3 +138,7 @@ SELECT * FROM read_json('todos.json');
- The DuckDB connector does not support `Decimal256` (76 digits), as it exceeds DuckDB's maximum Decimal width of 38 digits.
:::
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure DuckDB as a data connector in Spice. [DuckDB Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/duckdb/connector)
diff --git a/spiceaidocs/docs/components/data-connectors/file.md b/spiceaidocs/docs/components/data-connectors/file.md
index 2c295e85..38e44ff8 100644
--- a/spiceaidocs/docs/components/data-connectors/file.md
+++ b/spiceaidocs/docs/components/data-connectors/file.md
@@ -4,8 +4,7 @@ sidebar_label: 'File Data Connector'
description: 'File Data Connector Documentation'
---
-
-The File Data Connector enables federated/accelerated SQL queries on files stored by locally accessible filesystems. It supports querying individual files or entire directories, where all child files within the directory will be loaded and queried.
+The File Data Connector enables federated SQL queries on files stored by locally accessible filesystems. It supports querying individual files or entire directories, where all child files within the directory will be loaded and queried.
File formats are specified using the `file_format` parameter, as described in [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats).
@@ -35,8 +34,7 @@ Example:
datasets:
- from: file://path/to/customer.parquet
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -53,10 +51,10 @@ SELECT COUNT(*) FROM cool_dataset;
### `params`
-| Parameter name | Description |
-| --------------------------- | ------------------------------------------------------------------------------------------------ |
-| `file_format` | Specifies the data file format. Required if the format cannot be inferred from the `from` path. |
-| `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` |
+| Parameter name | Description |
+| --------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `file_format` | Specifies the data file format. Required if the format cannot be inferred from the `from` path. Refer to [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats) for details. |
+| `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` |
For CSV-specific parameters, see [CSV Parameters](/reference/file_format.md#csv).
@@ -111,4 +109,4 @@ datasets:
## Quickstarts and Samples
-Refer to the [File quickstart](https://github.com/spiceai/quickstarts/tree/trunk/file) to see an example of the File connector in use.
+- A quickstart tutorial to configure File as a data connector in Spice. [File Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/file)
diff --git a/spiceaidocs/docs/components/data-connectors/flightsql.md b/spiceaidocs/docs/components/data-connectors/flightsql.md
index fa5f8bfe..a4e9d3f6 100644
--- a/spiceaidocs/docs/components/data-connectors/flightsql.md
+++ b/spiceaidocs/docs/components/data-connectors/flightsql.md
@@ -18,118 +18,24 @@ Connect to any Flight SQL compatible server (e.g. Influx 3.0, CnosDB, other Spic
flightsql_password: ${secrets:my_flightsql_pass}
```
-## `params`
+## Configuration
-- `flightsql_endpoint`: The Apache Flight endpoint used to connect to the Flight SQL server.
-- `flightsql_username`: Optional. The username to use in the underlying Apache flight Handshake Request to authenticate to the server (see [reference](https://arrow.apache.org/docs/format/Flight.html#authentication)).
-- `flightsql_password` (optional): The password to use in the underlying Apache flight Handshake Request to authenticate to the server. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_flightsql_pass}`.
+### `from`
-## Auth Example
+The `from` field takes the form `flightsql:dataset` where `dataset` is the fully qualified name of the dataset to read from.
-Check [Secrets Stores](/components/secret-stores) for more details.
+### `name`
-
-
+The dataset name. This will be used as the table name within Spice.
- ```bash
- MY_USERNAME= \
- MY_PASSWORD= \
- spice run
- ```
+### `params`
- `.env`
- ```bash
- MY_USERNAME=
- MY_PASSWORD=
- ```
+| Parameter name | Description |
+| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `flightsql_endpoint` | The Apache Flight endpoint used to connect to the Flight SQL server. |
+| `flightsql_username` | Optional. The username to use in the underlying Apache flight Handshake Request to authenticate to the server (see [reference](https://arrow.apache.org/docs/format/Flight.html#authentication)). |
+| `flightsql_password` | Optional. The password to use in the underlying Apache flight Handshake Request to authenticate to the server. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_flightsql_pass}`. |
- `spicepod.yaml`
- ```yaml
- version: v1beta1
- kind: Spicepod
- name: spice-app
+## Secrets
- secrets:
- - from: env
- name: env
-
- datasets:
- - from: flightsql:my_catalog.good_schemas.cool_dataset
- name: cool_dataset
- params:
- flightsql_endpoint: http://1.2.3.4:50051
- flightsql_username: ${env:MY_USERNAME}
- flightsql_password: ${env:MY_PASSWORD}
- ```
-
- Learn more about [Env Secret Store](/components/secret-stores/env).
-
-
-
-
- ```bash
- kubectl create secret generic flightsql \
- --from-literal=username='' \
- --from-literal=password=''
- ```
-
- `spicepod.yaml`
- ```yaml
- version: v1beta1
- kind: Spicepod
- name: spice-app
-
- secrets:
- - from: kubernetes:flightsql
- name: flightsql
-
- datasets:
- - from: flightsql:my_catalog.good_schemas.cool_dataset
- name: cool_dataset
- params:
- flightsql_endpoint: http://1.2.3.4:50051
- flightsql_username: ${flightsql:username}
- flightsql_password: ${flightsql:password}
- ```
-
- Learn more about [Kubernetes Secret Store](/components/secret-stores/kubernetes).
-
-
-
- Add new keychain entries (macOS) for the user and password:
-
- ```bash
- # Add Username to keychain
- security add-generic-password -l "FlightSQL Username" \
- -a spiced -s spice_flightsql_username \
- -w
- # Add Password to keychain
- security add-generic-password -l "FlightSQL Password" \
- -a spiced -s spice_flightsql_password \
- -w
- ```
-
-
- `spicepod.yaml`
- ```yaml
- version: v1beta1
- kind: Spicepod
- name: spice-app
-
- secrets:
- - from: keyring
- name: keyring
-
- datasets:
- - from: flightsql:my_catalog.good_schemas.cool_dataset
- name: cool_dataset
- params:
- flightsql_endpoint: http://1.2.3.4:50051
- flightsql_username: ${keyring:spice_flightsql_username}
- flightsql_password: ${keyring:spice_flightsql_password}
- ```
-
- Learn more about [Keyring Secret Store](/components/secret-stores/keyring).
-
-
-
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
diff --git a/spiceaidocs/docs/components/data-connectors/ftp.md b/spiceaidocs/docs/components/data-connectors/ftp.md
index 7d9a55d5..a569ef05 100644
--- a/spiceaidocs/docs/components/data-connectors/ftp.md
+++ b/spiceaidocs/docs/components/data-connectors/ftp.md
@@ -32,12 +32,12 @@ If a folder is provided, all child files will be loaded.
The dataset name. This will be used as the table name within Spice.
Example:
+
```yaml
datasets:
- from: sftp://remote-sftp-server.com/path/to/folder/
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -66,6 +66,7 @@ SELECT COUNT(*) FROM cool_dataset;
| `hive_partitioning_enabled` | Optional. Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` |
#### SFTP
+
| Parameter Name | Description |
| --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `file_format` | Specifies the data file format. Required if the format cannot be inferred by from the `from` path. See [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats). |
@@ -80,32 +81,32 @@ SELECT COUNT(*) FROM cool_dataset;
### Connecting to FTP
```yaml
- - from: ftp://remote-ftp-server.com/path/to/folder/
- name: my_dataset
- params:
- file_format: csv
- ftp_user: my-ftp-user
- ftp_pass: ${secrets:my_ftp_password}
- hive_partitioning_enabled: false
+- from: ftp://remote-ftp-server.com/path/to/folder/
+ name: my_dataset
+ params:
+ file_format: csv
+ ftp_user: my-ftp-user
+ ftp_pass: ${secrets:my_ftp_password}
+ hive_partitioning_enabled: false
```
### Connecting to SFTP
```yaml
- - from: sftp://remote-sftp-server.com/path/to/folder/
- name: my_dataset
- params:
- file_format: csv
- sftp_port: 22
- sftp_user: my-sftp-user
- sftp_pass: ${secrets:my_sftp_password}
- hive_partitioning_enabled: false
+- from: sftp://remote-sftp-server.com/path/to/folder/
+ name: my_dataset
+ params:
+ file_format: csv
+ sftp_port: 22
+ sftp_user: my-sftp-user
+ sftp_pass: ${secrets:my_sftp_password}
+ hive_partitioning_enabled: false
```
-## Quickstarts and Samples
+## Secrets
-Refer to the [FTP quickstart](https://github.com/spiceai/quickstarts/tree/trunk/ftp) to see an example of the FTP connector in use.
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
-## Secrets
+## Quickstarts and Samples
-Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
\ No newline at end of file
+- A quickstart tutorial to configure File as a data connector in Spice. [FTP Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/ftp)
diff --git a/spiceaidocs/docs/components/data-connectors/github.md b/spiceaidocs/docs/components/data-connectors/github.md
index 56890296..991553d2 100644
--- a/spiceaidocs/docs/components/data-connectors/github.md
+++ b/spiceaidocs/docs/components/data-connectors/github.md
@@ -8,21 +8,33 @@ The GitHub Data Connector enables federated SQL queries on various GitHub resour
## Common Configuration
-The GitHub data connector can be configured by providing the following `params`. Use the [secret replacement syntax](../secret-stores/index.md) to load the access token from a secret store, e.g. `${secrets:GITHUB_TOKEN}`.
+## Configuration
-The GitHub data connector supports two authentication methods: using a personal access token or GitHub App Installation credentials. Use the [secret replacement syntax](../secret-stores/index.md) to load the access token or other secrets from a secret store.
+### `from`
-### Personal Access Token
+The `from` field takes the form of `github:github.com///` where `content` could be `files`, `issues`, `pulls`, `commits`, `stargazers`. See [examples](#examples) for more configuration detail.
-- `github_token`: Required. GitHub personal access token to use to connect to the GitHub API. [Learn more](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens).
+### `name`
-### GitHub App Installation
+The dataset name. This will be used as the table name within Spice.
+
+### `params`
+
+#### Personal Access Token
+
+| Parameter Name | Description |
+| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `github_token` | Required. GitHub personal access token to use to connect to the GitHub API. [Learn more](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens). |
+
+#### GitHub App Installation
GitHub Apps provide a secure and scalable way to integrate with GitHub's API. [Learn more](https://docs.github.com/en/apps).
-- `github_client_id`: Required. Specifies the client ID for GitHub App Installation auth mode.
-- `github_private_key`: Required. Specifies the private key for GitHub App Installation auth mode.
-- `github_installation_id`: Required. Specifies the installation ID for GitHub App Installation auth mode.
+| Parameter Name | Description |
+| ------------------------ | ------------------------------------------------------------------------------ |
+| `github_client_id` | Required. Specifies the client ID for GitHub App Installation auth mode. |
+| `github_private_key` | Required. Specifies the private key for GitHub App Installation auth mode. |
+| `github_installation_id` | Required. Specifies the installation ID for GitHub App Installation auth mode. |
:::note[Limitations]
@@ -30,13 +42,15 @@ With GitHub App Installation authentication, the connector's functionality depen
:::
-### Common Parameters
+#### Common Parameters
-- `github_query_mode`: Optional. Specifies whether the connector should use the GitHub [search API](https://docs.github.com/en/graphql/reference/queries#search) for improved filter performance. Defaults to `auto`, possible values of `auto` or `search`.
-- `owner` - Required. Specifies the owner of the GitHub repository.
-- `repo` - Required. Specifies the name of the GitHub repository.
+| Parameter Name | Description |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `github_query_mode` | Optional. Specifies whether the connector should use the GitHub [search API](https://docs.github.com/en/graphql/reference/queries#search) for improved filter performance. Defaults to `auto`, possible values of `auto` or `search`. |
+| `owner` | Required. Specifies the owner of the GitHub repository. |
+| `repo` | Required. Specifies the name of the GitHub repository. |
-### Filter Push Down
+## Filter Push Down
GitHub queries support a `github_query_mode` parameter, which can be set to either `auto` or `search` for the following types:
@@ -60,6 +74,8 @@ All other filters are supported when `github_query_mode` is set to `search`, but
:::
+## Examples
+
### Querying GitHub Files
:::warning[Limitations]
@@ -70,7 +86,7 @@ All other filters are supported when `github_query_mode` is set to `search`, but
:::
-- `ref` - Required. Specifies the GitHub branch or tag to fetch files from.
+- `ref` - Required. Specifies the GitHub branch or tag to fetch files from.
- `include` - Optional. Specifies a pattern to include specific files. Supports glob patterns. If not specified, all files are included by default.
```yaml
@@ -79,7 +95,7 @@ datasets:
name: spiceai.files
params:
github_token: ${secrets:GITHUB_TOKEN}
- include: "**/*.json; **/*.yaml"
+ include: '**/*.json; **/*.yaml'
acceleration:
enabled: true
```
@@ -87,7 +103,7 @@ datasets:
#### Schema
| Column Name | Data Type | Is Nullable |
-|--------------|-----------|-------------|
+| ------------ | --------- | ----------- |
| name | Utf8 | YES |
| path | Utf8 | YES |
| size | Int64 | YES |
@@ -105,7 +121,7 @@ datasets:
name: spiceai.files
params:
github_token: ${secrets:GITHUB_TOKEN}
- include: "**/*.txt" # include txt files only
+ include: '**/*.txt' # include txt files only
acceleration:
enabled: true
```
@@ -143,7 +159,7 @@ datasets:
#### Schema
| Column Name | Data Type | Is Nullable |
-|-----------------|--------------|-------------|
+| --------------- | ------------ | ----------- |
| assignees | List(Utf8) | YES |
| author | Utf8 | YES |
| body | Utf8 | YES |
@@ -204,27 +220,27 @@ datasets:
#### Schema
-| Column Name | Data Type | Is Nullable |
-|-----------------|------------|-------------|
-| additions | Int64 | YES |
-| assignees | List(Utf8) | YES |
-| author | Utf8 | YES |
-| body | Utf8 | YES |
-| changed_files | Int64 | YES |
-| closed_at | Timestamp | YES |
-| comments_count | Int64 | YES |
-| commits_count | Int64 | YES |
-| created_at | Timestamp | YES |
-| deletions | Int64 | YES |
-| hashes | List(Utf8) | YES |
-| id | Utf8 | YES |
-| labels | List(Utf8) | YES |
-| merged_at | Timestamp | YES |
-| number | Int64 | YES |
-| reviews_count | Int64 | YES |
-| state | Utf8 | YES |
-| title | Utf8 | YES |
-| url | Utf8 | YES |
+| Column Name | Data Type | Is Nullable |
+| -------------- | ---------- | ----------- |
+| additions | Int64 | YES |
+| assignees | List(Utf8) | YES |
+| author | Utf8 | YES |
+| body | Utf8 | YES |
+| changed_files | Int64 | YES |
+| closed_at | Timestamp | YES |
+| comments_count | Int64 | YES |
+| commits_count | Int64 | YES |
+| created_at | Timestamp | YES |
+| deletions | Int64 | YES |
+| hashes | List(Utf8) | YES |
+| id | Utf8 | YES |
+| labels | List(Utf8) | YES |
+| merged_at | Timestamp | YES |
+| number | Int64 | YES |
+| reviews_count | Int64 | YES |
+| state | Utf8 | YES |
+| title | Utf8 | YES |
+| url | Utf8 | YES |
#### Example
@@ -253,17 +269,17 @@ Time: 0.034996667 seconds. 1 rows.
```yaml
datasets:
-- from: github:github.com/spiceai/spiceai/pulls
- name: spiceai.pulls
- params:
- github_token: ${secrets:GITHUB_TOKEN}
- github_query_mode: search
- time_column: created_at
- acceleration:
- enabled: true
- refresh_mode: append
- refresh_check_interval: 6h # check for new results every 6 hours
- refresh_data_window: 90d # at initial load, load the last 90 days of pulls
+ - from: github:github.com/spiceai/spiceai/pulls
+ name: spiceai.pulls
+ params:
+ github_token: ${secrets:GITHUB_TOKEN}
+ github_query_mode: search
+ time_column: created_at
+ acceleration:
+ enabled: true
+ refresh_mode: append
+ refresh_check_interval: 6h # check for new results every 6 hours
+ refresh_data_window: 90d # at initial load, load the last 90 days of pulls
```
### Querying GitHub Commits
@@ -286,7 +302,7 @@ datasets:
#### Schema
| Column Name | Data Type | Is Nullable |
-|-------------------|-----------|-------------|
+| ----------------- | --------- | ----------- |
| additions | Int64 | YES |
| author_email | Utf8 | YES |
| author_name | Utf8 | YES |
@@ -349,17 +365,17 @@ datasets:
#### Schema
-| Column Name | Data Type | Is Nullable |
-|-------------------|-----------|-------------|
-| starred_at | Timestamp | YES |
-| login | Utf8 | YES |
-| email | Utf8 | YES |
-| name | Utf8 | YES |
-| company | Utf8 | YES |
-| x_username | Utf8 | YES |
-| location | Utf8 | YES |
-| avatar_url | Utf8 | YES |
-| bio | Utf8 | YES |
+| Column Name | Data Type | Is Nullable |
+| ----------- | --------- | ----------- |
+| starred_at | Timestamp | YES |
+| login | Utf8 | YES |
+| email | Utf8 | YES |
+| name | Utf8 | YES |
+| company | Utf8 | YES |
+| x_username | Utf8 | YES |
+| location | Utf8 | YES |
+| avatar_url | Utf8 | YES |
+| bio | Utf8 | YES |
#### Example
@@ -392,3 +408,7 @@ sql> select starred_at, login from spiceai.stargazers order by starred_at DESC l
Time: 0.0088075 seconds. 10 rows.
```
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Github as a data connector in Spice. [Github Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/github)
diff --git a/spiceaidocs/docs/components/data-connectors/graphql.md b/spiceaidocs/docs/components/data-connectors/graphql.md
index bf0a21b3..3d891f01 100644
--- a/spiceaidocs/docs/components/data-connectors/graphql.md
+++ b/spiceaidocs/docs/components/data-connectors/graphql.md
@@ -4,7 +4,6 @@ sidebar_label: 'GraphQL Data Connector'
description: 'GraphQL Data Connector Documentation'
---
-
The [GraphQL](https://graphql.org/) Data Connector enables federated SQL queries on any GraphQL endpoint by specifying `graphql` as the selector in the `from` value for the dataset.
```yaml
@@ -33,13 +32,28 @@ datasets:
## Configuration
+### `from`
+
+The `from` field takes the form of `graphql:your-graphql-endpoint`.
+
+### `name`
+
+The dataset name. This will be used as the table name within Spice.
+
+### `params`
+
The GraphQL data connector can be configured by providing the following `params`. Use the [secret replacement syntax](../secret-stores/index.md) to load the password from a secret store, e.g. `${secrets:my_graphql_auth_token}`.
-- `unnest_depth`: Depth level to automatically unnest objects to. By default, disabled if unspecified or `0`.
-- `graphql_auth_token`: The authentication token to use to connect to the GraphQL server. Uses bearer authentication.
-- `graphql_auth_user`: The username to use for basic auth. E.g. `graphql_auth_user: my_user`
-- `graphql_auth_pass`: The password to use for basic auth. E.g. `graphql_auth_pass: ${secrets:my_graphql_auth_pass}`
-- `graphql_query`: The GraphQL query to execute. E.g.
+| Parameter Name | Description |
+| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `unnest_depth` | Depth level to automatically unnest objects to. By default, disabled if unspecified or `0`. |
+| `graphql_auth_token` | The authentication token to use to connect to the GraphQL server. Uses bearer authentication. |
+| `graphql_auth_user` | The username to use for basic auth. E.g. `graphql_auth_user: my_user` |
+| `graphql_auth_pass` | The password to use for basic auth. E.g. `graphql_auth_pass: ${secrets:my_graphql_auth_pass}` |
+| `graphql_query` | The username to use for basic auth. See [examples](#examples) for a sample GraphQL query |
+| `json_pointer` | The [JSON pointer](https://datatracker.ietf.org/doc/html/rfc6901) into the response body. When `graphql_query` is [paginated](#pagination), the `json_pointer` can be inferred. |
+
+#### GraphQL Query Example
```yaml
query: |
@@ -53,8 +67,6 @@ query: |
}
```
-- `json_pointer`: The [JSON pointer](https://datatracker.ietf.org/doc/html/rfc6901) into the response body. When `graphql_query` is [paginated](#pagination), the `json_pointer` can be inferred.
-
### Examples
Example using the GitHub GraphQL API and Bearer Auth. The following will use `json_pointer` to retrieve all of the nodes in starredRepositories:
@@ -83,7 +95,6 @@ params:
}
}
}
-
```
## Pagination
@@ -308,3 +319,7 @@ params:
}
}
```
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure GraphQL as a data connector in Spice. [GraphQL Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/graphql)
diff --git a/spiceaidocs/docs/components/data-connectors/https.md b/spiceaidocs/docs/components/data-connectors/https.md
index 58754be6..a92353e1 100644
--- a/spiceaidocs/docs/components/data-connectors/https.md
+++ b/spiceaidocs/docs/components/data-connectors/https.md
@@ -5,7 +5,7 @@ description: 'HTTP(s) Data Connector Documentation'
pagination_prev: null
---
-The HTTP(s) Data Connector enables federated/accelerated SQL query across [supported file formats](/components/data-connectors/index.md#object-store-file-formats) stored at an HTTP(s) endpoint.
+The HTTP(s) Data Connector enables federated SQL query across [supported file formats](/components/data-connectors/index.md#object-store-file-formats) stored at an HTTP(s) endpoint.
```yaml
datasets:
@@ -26,12 +26,12 @@ The `from` field must contain a valid URI to the location of a [supported file](
The dataset name. This will be used as the table name within Spice.
Example:
+
```yaml
datasets:
- from: http://static_username@localhost:3001/report.csv
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -60,6 +60,7 @@ The connector supports Basic HTTP authentication via `param` values.
## Examples
### Basic example
+
```yaml
datasets:
- from: https://github.com/LAION-AI/audio-dataset/raw/7fd6ae3cfd7cde619f6bed817da7aa2202a5bc28/metadata/freesound/parquet/freesound_parquet.parquet
@@ -67,6 +68,7 @@ datasets:
```
### Using Basic Authentication
+
```yaml
datasets:
- from: http://static_username@localhost:3001/report.csv
@@ -77,4 +79,4 @@ datasets:
## Secrets
-Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
\ No newline at end of file
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
diff --git a/spiceaidocs/docs/components/data-connectors/localpod.md b/spiceaidocs/docs/components/data-connectors/localpod.md
index 83aec516..e9da6d98 100644
--- a/spiceaidocs/docs/components/data-connectors/localpod.md
+++ b/spiceaidocs/docs/components/data-connectors/localpod.md
@@ -23,17 +23,20 @@ When synchronization is enabled, the following logs will be emitted:
```yaml
datasets:
-- from: postgres:cleaned_sales_data
- name: test
- params:
- ...
- acceleration:
- enabled: true # This dataset will be accelerated into a DuckDB file
- engine: duckdb
- mode: file
- refresh_check_interval: 10s
-- from: localpod:test
- name: test_local
- acceleration:
- enabled: true # This dataset accelerates the parent `test` dataset into in-memory Arrow records and is synchronized with the parent
+ - from: postgres:cleaned_sales_data
+ name: test
+ params: ...
+ acceleration:
+ enabled: true # This dataset will be accelerated into a DuckDB file
+ engine: duckdb
+ mode: file
+ refresh_check_interval: 10s
+ - from: localpod:test
+ name: test_local
+ acceleration:
+ enabled: true # This dataset accelerates the parent `test` dataset into in-memory Arrow records and is synchronized with the parent
```
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Localpod as a data connector in Spice. [Localpod Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/localpod)
diff --git a/spiceaidocs/docs/components/data-connectors/memory.md b/spiceaidocs/docs/components/data-connectors/memory.md
index 59b53d05..12185ccf 100644
--- a/spiceaidocs/docs/components/data-connectors/memory.md
+++ b/spiceaidocs/docs/components/data-connectors/memory.md
@@ -6,21 +6,26 @@ pagination_prev: null
---
The Memory Data Connector enables configuring an in-memory dataset for tables used, or produced by the Spice runtime. Only certain tables, with predefined schemas, can be defined by the connector. These are:
- - `store`: Defines a table that LLMs, with [memory tooling](/features/large-language-models/memory), can store data in. Requires `mode: read_write`.
+
+- `store`: Defines a table that LLMs, with [memory tooling](/features/large-language-models/memory), can store data in. Requires `mode: read_write`.
### Examples
```yaml
datasets:
-- from: memory:store
- name: llm_memory
- mode: read_write
- columns:
- - name: value
- embeddings: # Easily make your LLM learnings searchable.
- - from: all-MiniLM-L6-v2
+ - from: memory:store
+ name: llm_memory
+ mode: read_write
+ columns:
+ - name: value
+ embeddings: # Easily make your LLM learnings searchable.
+ - from: all-MiniLM-L6-v2
embeddings:
- name: all-MiniLM-L6-v2
from: huggingface:huggingface.co/sentence-transformers/all-MiniLM-L6-v2
```
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to provide persistent memory capabilities for language models in Spice. [LLM Memory Quickstarts](https://github.com/spiceai/quickstarts/tree/trunk/llm-memory)
diff --git a/spiceaidocs/docs/components/data-connectors/mssql.md b/spiceaidocs/docs/components/data-connectors/mssql.md
index c98188cd..7f0f2c6a 100644
--- a/spiceaidocs/docs/components/data-connectors/mssql.md
+++ b/spiceaidocs/docs/components/data-connectors/mssql.md
@@ -34,12 +34,12 @@ The `from` field takes the form `mssql:database.schema.table` where `database.sc
The dataset name. This will be used as the table name within Spice.
Example:
+
```yaml
datasets:
- from: mssql:path.to.my_dataset
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -63,7 +63,7 @@ The data connector supports the following `params`. Use the [secret replacement
| `mssql_connection_string` | The ADO connection string to use to connect to the server. This can be used instead of providing individual connection parameters. |
| `mssql_host` | The hostname or IP address of the Microsoft SQL Server instance. |
| `mssql_port` | (Optional) The port of the Microsoft SQL Server instance. Default value is 1433. |
-| `mssql_database` | (Optional) The name of the database to connect to. The default database (`master`) will be used if not specified. |
+| `mssql_database` | (Optional) The name of the database to connect to. The default database (`master`) will be used if not specified. |
| `mssql_username` | The username for the SQL Server authentication. |
| `mssql_password` | The password for the SQL Server authentication. |
| `mssql_encrypt` | (Optional) Specifies whether encryption is required for the connection.
- `true`: (default) This mode requires an SSL connection. If a secure connection cannot be established, server will not connect.
- `false`: This mode will not attempt to use an SSL connection, even if the server supports it. Only the login procedure is encrypted.
|
@@ -86,4 +86,8 @@ datasets:
## Secrets
-Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
\ No newline at end of file
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Microsoft SQL Server as a data connector in Spice. [Microsoft SQL Server Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/mssql)
diff --git a/spiceaidocs/docs/components/data-connectors/mysql.md b/spiceaidocs/docs/components/data-connectors/mysql.md
index 67555300..58b2509c 100644
--- a/spiceaidocs/docs/components/data-connectors/mysql.md
+++ b/spiceaidocs/docs/components/data-connectors/mysql.md
@@ -185,3 +185,9 @@ datasets:
## Secrets
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure MySQL as a data connector in Spice. [MySQL Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/mysql)
+- A quickstart tutorial to configure AWS RDS Aurora (MySQL Compatible) as a data connector in Spice. [AWS RDS Aurora (MySQL Compatible) quickstart](https://github.com/spiceai/quickstarts/tree/trunk/rds-aurora-mysql)
+- A quickstart tutorial to configure Planetscale as a data connector in Spice. [Planetscale quickstart](https://github.com/spiceai/quickstarts/tree/trunk/planetscale)
diff --git a/spiceaidocs/docs/components/data-connectors/odbc.md b/spiceaidocs/docs/components/data-connectors/odbc.md
index 50beb3cb..7f278c56 100644
--- a/spiceaidocs/docs/components/data-connectors/odbc.md
+++ b/spiceaidocs/docs/components/data-connectors/odbc.md
@@ -97,8 +97,7 @@ Example:
datasets:
- from: odbc:my.cool.table
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -115,14 +114,14 @@ SELECT COUNT(*) FROM cool_dataset;
### `params`
-| Parameter | Type | Description |
-| ----------------------------- | -------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `sql_dialect` | string | Override what SQL dialect is used for the ODBC connection. Supports `postgresql`, `mysql`, `sqlite`, `athena` or `databricks` values. Default is unset (auto-detected). |
-| `odbc_max_bytes_per_batch` | number (bytes) | Maximum number of bytes transferred in each query record batch. A lower value may improve performance on low-memory systems. Default is `512_000_000`. |
-| `odbc_max_num_rows_per_batch` | number (rows) | Maximum number of rows transferred in each query record batch. A higher value may speed up query results, but requires more memory in conjunction with `odbc_max_bytes_per_batch`. Default is `65536`. |
-| `odbc_max_text_size` | number (bytes) | A limit for the maximum size of text columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size). |
-| `odbc_max_binary_size` | number (bytes) | A limit for the maximum size of binary columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size). |
-| `odbc_connection_string` | string | Connection string to use to connect to the ODBC server |
+| Parameter | Type | Description |
+| ----------------------------- | -------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
+| `sql_dialect` | string | Override what SQL dialect is used for the ODBC connection. Supports `postgresql`, `mysql`, `sqlite`, `athena` or `databricks` values. Default is unset (auto-detected). |
+| `odbc_max_bytes_per_batch` | number (bytes) | Maximum number of bytes transferred in each query record batch. A lower value may improve performance on low-memory systems. Default is `512_000_000`. |
+| `odbc_max_num_rows_per_batch` | number (rows) | Maximum number of rows transferred in each query record batch. A higher value may speed up query results, but requires more memory in conjunction with `odbc_max_bytes_per_batch`. Default is `65536`. |
+| `odbc_max_text_size` | number (bytes) | A limit for the maximum size of text columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size). |
+| `odbc_max_binary_size` | number (bytes) | A limit for the maximum size of binary columns transmitted between the ODBC driver and the Runtime. Default is unset (allocates driver-reported max column size). |
+| `odbc_connection_string` | string | Connection string to use to connect to the ODBC server |
```yaml
datasets:
@@ -243,13 +242,13 @@ version: v1beta1
kind: Spicepod
name: sqlite
datasets:
-- from: odbc:spice_test
- name: spice_test
- mode: read
- acceleration:
- enabled: false
- params:
- odbc_connection_string: DRIVER={SQLite3};SERVER=localhost;DATABASE=test.db;Trusted_connection=yes
+ - from: odbc:spice_test
+ name: spice_test
+ mode: read
+ acceleration:
+ enabled: false
+ params:
+ odbc_connection_string: DRIVER={SQLite3};SERVER=localhost;DATABASE=test.db;Trusted_connection=yes
```
All together now:
@@ -290,13 +289,13 @@ version: v1beta1
kind: Spicepod
name: sqlite
datasets:
-- from: odbc:spice_test
- name: spice_test
- mode: read
- acceleration:
- enabled: false
- params:
- odbc_connection_string: DRIVER={SQLite3};SERVER=localhost;DATABASE=test.db;Trusted_connection=yes
+ - from: odbc:spice_test
+ name: spice_test
+ mode: read
+ acceleration:
+ enabled: false
+ params:
+ odbc_connection_string: DRIVER={SQLite3};SERVER=localhost;DATABASE=test.db;Trusted_connection=yes
```
### Connecting to Postgres
@@ -320,10 +319,10 @@ version: v1beta1
kind: Spicepod
name: odbc-demo
datasets:
-- from: odbc:taxi_trips
- name: taxi_trips
- params:
- odbc_connection_string: Driver={PostgreSQL Unicode};Server=localhost;Port=5432;Database=spice_demo;Uid=postgres
+ - from: odbc:taxi_trips
+ name: taxi_trips
+ params:
+ odbc_connection_string: Driver={PostgreSQL Unicode};Server=localhost;Port=5432;Database=spice_demo;Uid=postgres
```
See the [ODBC Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/odbc/README.md) for more help on getting started with ODBC and Postgres.
@@ -331,3 +330,7 @@ See the [ODBC Quickstart](https://github.com/spiceai/quickstarts/blob/trunk/odbc
## Secrets
Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure ODBC as a data connector in Spice. [ODBC Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/odbc)
diff --git a/spiceaidocs/docs/components/data-connectors/postgres/index.md b/spiceaidocs/docs/components/data-connectors/postgres/index.md
index ee1483b6..c97b2fac 100644
--- a/spiceaidocs/docs/components/data-connectors/postgres/index.md
+++ b/spiceaidocs/docs/components/data-connectors/postgres/index.md
@@ -12,8 +12,7 @@ The PostgreSQL Server Data Connector enables federated/accelerated SQL queries o
datasets:
- from: postgres:my_table
name: my_dataset
- params:
- ...
+ params: ...
```
## Configuration
@@ -28,8 +27,7 @@ The fully-qualified table name (`database.schema.table`) can also be used in the
datasets:
- from: postgres:my_database.my_schema.my_table
name: my_dataset
- params:
- ...
+ params: ...
```
### `name`
@@ -37,12 +35,12 @@ datasets:
The dataset name. This will be used as the table name within Spice.
Example:
+
```yaml
datasets:
- from: postgres:my_database.my_schema.my_table
name: cool_dataset
- params:
- ...
+ params: ...
```
```sql
@@ -78,40 +76,40 @@ The connection to PostgreSQL can be configured by providing the following `param
The table below shows the PostgreSQL data types supported, along with the type mapping to Apache Arrow types in Spice.
-| PostgreSQL Type | Arrow Type |
-| ----------------- | ----------------------------------------------- |
-| `int2` | `Int16` |
-| `int4` | `Int32` |
-| `int8` | `Int64` |
-| `money` | `Int64` |
-| `float4` | `Float32` |
-| `float8` | `Float64` |
-| `numeric` | `Decimal128` |
-| `text` | `Utf8` |
-| `varchar` | `Utf8` |
-| `bpchar` | `Utf8` |
-| `uuid` | `Utf8` |
-| `bytea` | `Binary` |
-| `bool` | `Boolean` |
-| `json` | `LargeUtf8` |
-| `timestamp` | `Timestamp(Nanosecond, None)` |
-| `timestampz` | `Timestamp(Nanosecond, TimeZone` |
-| `date` | `Date32` |
-| `time` | `Time64(Nanosecond)` |
-| `interval` | `Interval(MonthDayNano)` |
-| `point` | `FixedSizeList(Float64[2])` |
-| `int2[]` | `List(Int16)` |
-| `int4[]` | `List(Int32)` |
-| `int8[]` | `List(Int64)` |
-| `float4[]` | `List(Float32)` |
-| `float8[]` | `List(Float64)` |
-| `text[]` | `List(Utf8)` |
-| `bool[]` | `List(Boolean)` |
-| `bytea[]` | `List(Binary)` |
-| `geometry` | `Binary` |
-| `geography` | `Binary` |
-| `enum` | `Dictionary(Int8, Utf8)` |
-| Composite Types | `Struct` |
+| PostgreSQL Type | Arrow Type |
+| --------------- | -------------------------------- |
+| `int2` | `Int16` |
+| `int4` | `Int32` |
+| `int8` | `Int64` |
+| `money` | `Int64` |
+| `float4` | `Float32` |
+| `float8` | `Float64` |
+| `numeric` | `Decimal128` |
+| `text` | `Utf8` |
+| `varchar` | `Utf8` |
+| `bpchar` | `Utf8` |
+| `uuid` | `Utf8` |
+| `bytea` | `Binary` |
+| `bool` | `Boolean` |
+| `json` | `LargeUtf8` |
+| `timestamp` | `Timestamp(Nanosecond, None)` |
+| `timestampz` | `Timestamp(Nanosecond, TimeZone` |
+| `date` | `Date32` |
+| `time` | `Time64(Nanosecond)` |
+| `interval` | `Interval(MonthDayNano)` |
+| `point` | `FixedSizeList(Float64[2])` |
+| `int2[]` | `List(Int16)` |
+| `int4[]` | `List(Int32)` |
+| `int8[]` | `List(Int64)` |
+| `float4[]` | `List(Float32)` |
+| `float8[]` | `List(Float64)` |
+| `text[]` | `List(Utf8)` |
+| `bool[]` | `List(Boolean)` |
+| `bytea[]` | `List(Binary)` |
+| `geometry` | `Binary` |
+| `geography` | `Binary` |
+| `enum` | `Dictionary(Int8, Utf8)` |
+| Composite Types | `Struct` |
:::info
@@ -177,4 +175,10 @@ datasets:
## Secrets
-Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
\ No newline at end of file
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure PostgreSQL as a data connector in Spice. [PostgreSQL Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/postgres)
+- A quickstart tutorial to configure AWS RDS for PostgreSQL as a data connector in Spice. [AWS RDS for PostgreSQL Data Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/rds-postgresql)
+- A quickstart tutorial to configure Supabase a data connector in Spice. [Supabase quickstart](https://github.com/spiceai/quickstarts/tree/trunk/supabase)
diff --git a/spiceaidocs/docs/components/data-connectors/s3.md b/spiceaidocs/docs/components/data-connectors/s3.md
index 1dd7d636..4c8de63d 100644
--- a/spiceaidocs/docs/components/data-connectors/s3.md
+++ b/spiceaidocs/docs/components/data-connectors/s3.md
@@ -54,17 +54,17 @@ SELECT COUNT(*) FROM cool_dataset;
### `params`
-| Parameter Name | Description |
-| --------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `file_format` | Specifies the data format. Required if it cannot be inferred from the object URI. Options: `parquet`, `csv`, `json`. |
-| `s3_endpoint` | S3 endpoint URL (e.g., for MinIO). Default is the region endpoint. E.g. `s3_endpoint: https://my.minio.server` |
-| `s3_region` | S3 bucket region. Default: `us-east-1`. |
-| `client_timeout` | Timeout for S3 operations. Default: `30s`. |
-| `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` |
-| `s3_auth` | Authentication type. Options: `public`, `key` and `iam_role`. Defaults to `public` if `s3_key` and `s3_secret` are not provided, otherwise defaults to `key`. |
-| `s3_key` | Access key (e.g. `AWS_ACCESS_KEY_ID` for AWS) |
-| `s3_secret` | Secret key (e.g. `AWS_SECRET_ACCESS_KEY` for AWS) |
-| `allow_http` | Allow insecure HTTP connections to `s3_endpoint`. Defaults to `false` |
+| Parameter Name | Description |
+| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `file_format` | Specifies the data format. Required if it cannot be inferred from the object URI. Options: `parquet`, `csv`, `json`. Refer to [Object Store File Formats](/components/data-connectors/index.md#object-store-file-formats) for details. |
+| `s3_endpoint` | S3 endpoint URL (e.g., for MinIO). Default is the region endpoint. E.g. `s3_endpoint: https://my.minio.server` |
+| `s3_region` | S3 bucket region. Default: `us-east-1`. |
+| `client_timeout` | Timeout for S3 operations. Default: `30s`. |
+| `hive_partitioning_enabled` | Enable partitioning using hive-style partitioning from the folder structure. Defaults to `false` |
+| `s3_auth` | Authentication type. Options: `public`, `key` and `iam_role`. Defaults to `public` if `s3_key` and `s3_secret` are not provided, otherwise defaults to `key`. |
+| `s3_key` | Access key (e.g. `AWS_ACCESS_KEY_ID` for AWS) |
+| `s3_secret` | Secret key (e.g. `AWS_SECRET_ACCESS_KEY` for AWS) |
+| `allow_http` | Allow insecure HTTP connections to `s3_endpoint`. Defaults to `false` |
For additional CSV parameters, see [CSV Parameters](/reference/file_format.md#csv)
@@ -146,10 +146,8 @@ datasets:
## Secrets
-Spice supports three types of [secret stores](/components/secret-stores):
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
-- [Environment variables](/components/secret-stores/env)
-- [Kubernetes Secret Store](/components/secret-stores/kubernetes)
-- [Keyring Secret Store](/components/secret-stores/keyring)
+## Quickstarts and Samples
-Explore the different options to manage sensitive data securely.
+- A quickstart tutorial to configure S3 as a data connector in Spice. [S3 Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/s3)
diff --git a/spiceaidocs/docs/components/data-connectors/sharepoint.md b/spiceaidocs/docs/components/data-connectors/sharepoint.md
index 9a554b48..6c155d1e 100644
--- a/spiceaidocs/docs/components/data-connectors/sharepoint.md
+++ b/spiceaidocs/docs/components/data-connectors/sharepoint.md
@@ -140,3 +140,11 @@ And set the `SPICE_SHAREPOINT_BEARER_TOKEN` secret via:
```shell
spice login sharepoint --tenant-id $TENANT_ID --client-id f2b3116e-b4c4-464f-80ec-73cd9d9886b4
```
+
+## Secrets
+
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Sharepoint as a data connector in Spice. [Sharepoint Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/sharepoint)
diff --git a/spiceaidocs/docs/components/data-connectors/snowflake.md b/spiceaidocs/docs/components/data-connectors/snowflake.md
index 48f4539a..a3e84fcf 100644
--- a/spiceaidocs/docs/components/data-connectors/snowflake.md
+++ b/spiceaidocs/docs/components/data-connectors/snowflake.md
@@ -23,13 +23,29 @@ datasets:
Unquoted table identifiers should be UPPERCASED in the `from` field. See [Identifier resolution](https://docs.snowflake.com/en/sql-reference/identifiers-syntax#label-identifier-casing).
:::
-### Parameters
+## Configuration
-- `from`: a Snowflake fully qualified table name (database.schema.table). For instance `snowflake:SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.LINEITEM` or `snowflake:TAXI_DATA."2024".TAXI_TRIPS`
-- `snowflake_warehouse`: optional, specifies the [Snowflake Warehouse](https://docs.snowflake.com/en/user-guide/warehouses-tasks) to use
-- `snowflake_role`: optional, specifies the role to use for accessing Snowflake data
+### `from`
-### Auth
+A Snowflake fully qualified table name (database.schema.table). For instance `snowflake:SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.LINEITEM` or `snowflake:TAXI_DATA."2024".TAXI_TRIPS`
+
+### `name`
+
+The dataset name. This will be used as the table name within Spice.
+
+### `params`
+
+| Parameter Name | Description |
+| ---------------------------------- | --------------------------------------------------------------------------------------------------------------- |
+| `snowflake_warehouse` | Optional, specifies the [Snowflake Warehouse](https://docs.snowflake.com/en/user-guide/warehouses-tasks) to use |
+| `snowflake_role` | Optional, specifies the role to use for accessing Snowflake data |
+| `snowflake_account` | Required, specifies the Snowflake account-identifier |
+| `snowflake_username` | Required, specifies the Snowflake username to use for accessing Snowflake data |
+| `snowflake_password` | Optional, specifies the Snowflake password to use for accessing Snowflake data |
+| `snowflake_private_key_path` | Optional, specifies the path to Snowflake private key |
+| `snowflake_private_key_passphrase` | Optional, specifies the Snowflake private key passphrase |
+
+## Auth
The connector supports password-based and [key-pair](https://docs.snowflake.com/en/user-guide/key-pair-auth) authentication that must be configured using `spice login snowflake` or using [Secrets Stores](/components/secret-stores). Login requires the account identifier ('orgname-accountname' format) - use [Finding the organization and account name for an account](https://docs.snowflake.com/en/user-guide/admin-account-identifier#finding-the-organization-and-account-name-for-an-account) instructions.
@@ -193,3 +209,11 @@ datasets:
1. The connector supports password-based and [key-pair](https://docs.snowflake.com/en/user-guide/key-pair-auth) authentication.
:::
+
+## Secrets
+
+Spice integrates with multiple secret stores to help manage sensitive data securely. For detailed information on supported secret stores, refer to the [secret stores documentation](/components/secret-stores). Additionally, learn how to use referenced secrets in component parameters by visiting the [using referenced secrets guide](/components/secret-stores#using-secrets).
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Snowflake as a data connector in Spice. [Snowflake Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/snowflake)
diff --git a/spiceaidocs/docs/components/data-connectors/spark.md b/spiceaidocs/docs/components/data-connectors/spark.md
index 20356a6b..b8de7047 100644
--- a/spiceaidocs/docs/components/data-connectors/spark.md
+++ b/spiceaidocs/docs/components/data-connectors/spark.md
@@ -118,3 +118,7 @@ Check [Secrets Stores](/components/secret-stores) for more details.
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Spark as a data connector in Spice. [Spark Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/spark)
diff --git a/spiceaidocs/docs/components/data-connectors/spiceai.md b/spiceaidocs/docs/components/data-connectors/spiceai.md
index a64b7481..98b6f525 100644
--- a/spiceaidocs/docs/components/data-connectors/spiceai.md
+++ b/spiceaidocs/docs/components/data-connectors/spiceai.md
@@ -5,7 +5,7 @@ description: 'Spice.ai Data Connector Documentation'
pagination_next: null
---
-The [Spice.ai](https://spice.ai/) Data Connector enables federated SQL query across datasets in the [Spice.ai Cloud Platform](https://docs.spice.ai/building-blocks/datasets). Access to these datasets requires a free [Spice.ai account](https://spice.ai/login).
+The [Spice.ai](https://spice.ai/) Data Connector enables federated SQL query across datasets in the [Spice.ai Cloud Platform](https://docs.spice.ai/building-blocks/datasets). Access to these datasets requires a free [Spice.ai account](https://spice.ai/login).
## Configuration
@@ -44,3 +44,7 @@ The Spice.ai Cloud Platform dataset URI. To query a dataset in a public Spice.ai
acceleration:
enabled: true
```
+
+## Quickstarts and Samples
+
+- A quickstart tutorial to configure Spice.ai Cloud Platform as a data connector in Spice. [Spice.ai Connector quickstart](https://github.com/spiceai/quickstarts/tree/trunk/spiceai)