kafka-delta-ingest is designed to work well with AWS and Azure, which can add some complexity to the development environment. This document outlines how to work with kafka-delta-ingest locally for new and existing Rust developers.
Make sure the docker-compose setup has been ran, and execute cargo test
to run unit and integration tests.
Name | Default | Notes |
---|---|---|
|
|
A kafka broker string which can be used during integration testing |
|
AWS endpoint URL for something that can provide stub S3 and DynamoDB operations (e.g. Localstack) |
|
|
|
Bucket to use for test data at the given endpoint |
A tarball containing 100K line-delimited JSON messages is included in tests/json/web_requests-100K.json.tar.gz
. Running ./bin/extract-example-json.sh
will unpack this to the expected location.
{
"meta": {
"producer": {
"timestamp": "2021-03-24T15:06:17.321710+00:00"
}
},
"method": "DELETE",
"session_id": "7c28bcf9-be26-4d0b-931a-3374ab4bb458",
"status": 204,
"url": "http://www.youku.com",
"uuid": "831c6afa-375c-4988-b248-096f9ed101f8"
}
After extracting the example data, you’ll need to play this onto the web_requests topic of the local Kafka container.
ℹ️
|
URLs sampled for the test data are sourced from Wikipedia’s list of most popular websites - https://en.wikipedia.org/wiki/List_of_most_popular_websites. |
-
List data files -
ls tests/data/web_requests/date=2021-03-24
-
List delta log files -
ls tests/data/web_requests/_delta_log
-
Show some parquet data (using parquet-tools)
-
parquet-tools show tests/data/web_requests/date=2021-03-24/<some file written by your example>
-