Skip to content

boilingdata/data-taps-nycopendata-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FluentBit | Web Analytics | PostgreSQL CDC | REST API | OpenSearch/ES | AWS Lambda Telemetry

NYC OpenData API --> Data Tap --> S3 Parquet

This example illustrates how a scheduled AWS Lambda function can fetch new data since last fetch from an API (NYC Housing Maintenance Code Complaints and Problems) and feed to Data Tap for optimal S3 ingestion and analytics.

Data Tap

A Data Tap is a single AWS Lambda function with Function URL and customized C++ runtime embedding DuckDB. It uses streaming SQL clause to upload the buffered HTTP POSTed newline JSON data in the Lambda to S3, hive partitioned, and as ZSTD compressed Parquet. You can tune the SQL clause your self for filtering, search, and aggregations. You can also set the thresholds when the upload to S3 happens. A Data Tap runs already very efficiently with the smallest arm64 AWS Lambda, making it the simplest, fastest, and most cost efficient solution for streaming data onto S3 in scale. You can run it on your own AWS Account or hosted by Boiling Cloud.

You need to have BoilingData account and use it to create a Data Tap. The account is used to fetch authorization tokens which allow you to send data to a Data Tap (security access control). You can also share write access (see the AUTHORIZED_USERS AWS Lambda environment variable) to other BoilingData users if you like, efficiently creating Data Mesh architectures.

Run

The data is fetched ordered by received_date, and the latest record timestamp +1ms is stored into Parameter Store on every iteration. This avoids duplicates.

Create .env file containing your credentials like in the example below.

export BD_TAPURL=deployedDataTapUrl
export BD_USERNAME=yourBoilingUsername
export BD_PASSWORD=yourBoilingPassword
export SODA_USERNAME=socrataOpenDataAPIKeyId
export SODA_PASSWORD=socrataOpenDataSecretKey
export SODA_APPTOKEN=socrataOpenDataAppToken
source .env # The envs will be given as parameters for the stack deployment
yarn test
yarn build
yarn deploy
# ...
yarn destroy # delete the deployment

About

API ingestion to Data Tap example

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published