Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: migrate demo projects here as intergration tests #8502

Merged
merged 13 commits into from
Mar 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions .github/workflows/intergration_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
name: Integration Tests CI

on:
schedule:
# Currently we build docker images at 12:00 (UTC), so run this at 13:00
- cron: '0 13 * * *'

jobs:
golangci:
name: lint
runs-on: ubuntu-latest
steps:
- uses: actions/setup-go@v3
with:
go-version: 1.18
- uses: actions/checkout@v3
- name: golangci-lint
uses: golangci/golangci-lint-action@v3
with:
working-directory: integration_tests/datagen
args: --timeout=120s
- name: Go build
run: |
go mod tidy
git diff --exit-code go.mod go.sum
go build .
working-directory: integration_tests/datagen
run-demos:
strategy:
matrix:
testcase:
- ad-click
- ad-ctr
- cdn-metrics
- clickstream
- livestream
- twitter
- prometheus
- schema-registry
- mysql-cdc
- postgres-cdc
#- mysql-sink
- postgres-sink
- iceberg-sink
format: ["json", "protobuf"]
exclude:
- testcase: ad-click
format: protobuf
- testcase: ad-ctr
format: protobuf
- testcase: cdn-metrics
format: protobuf
- testcase: clickstream
format: protobuf
- testcase: prometheus
format: protobuf
# This demo is showcasing avro + schema registry. So there's no file server for the schema file.
- testcase: schema-registry
format: protobuf
- testcase: mysql-cdc
format: protobuf
- testcase: postgres-cdc
format: protobuf
- testcase: mysql-sink
format: protobuf
- testcase: postgres-sink
format: protobuf
- testcase: iceberg-sink
format: protobuf
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
with:
fetch-depth: 0

# In this step, this action saves a list of existing images,
# the cache is created without them in the post run.
# It also restores the cache if it exists.
- uses: satackey/action-docker-layer-caching@v0.0.11
# Ignore the failure of a step and avoid terminating the job.
continue-on-error: true

- name: Rewrite docker compose for protobuf
working-directory: integration_tests/scripts
if: ${{ matrix.format == 'protobuf' }}
run: |
python3 gen_pb_compose.py ${{ matrix.testcase }} ${{ matrix.format }}

- name: Run Demos
working-directory: integration_tests/scripts
run: |
python3 run_demos.py --case ${{ matrix.testcase }} --format ${{ matrix.format }}

- name: Check if the ingestion is successful
working-directory: integration_tests/scripts
run: |
python3 check_data.py ${{ matrix.testcase }}

- name: Dump logs on failure
if: ${{ failure() }}
working-directory: integration_tests/${{ matrix.testcase }}
run: |
docker compose logs

- uses: satackey/action-docker-layer-caching@v0.0.11
continue-on-error: true
2 changes: 1 addition & 1 deletion .github/workflows/typo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ jobs:
uses: actions/checkout@v3

- name: Check spelling of the entire repository
uses: crate-ci/typos@v1.11.1
uses: crate-ci/typos@v1.13.20
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore the type checker error. Reported to crate-ci/typos#683

4 changes: 2 additions & 2 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -327,7 +327,7 @@ services:
- "9644:9644"
depends_on: []
volumes:
- "redpanda:/var/lib/redpanda/data"
- "message_queue:/var/lib/redpanda/data"
environment: {}
container_name: message_queue
healthcheck:
Expand All @@ -348,6 +348,6 @@ volumes:
external: false
prometheus-0:
external: false
redpanda:
message_queue:
external: false
name: risingwave-compose
46 changes: 46 additions & 0 deletions integration_tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# RisingWave Demos

Here is a gallery of demos that present how to use RisingWave alongwith the ecosystem tools.

- `ad-click/`: [Build and Maintain Real-time Applications Faster and Easier with Redpanda and RisingWave](https://singularity-data.com/blog/build-with-Redpanda-and-RisingWave)
- `ad-ctr`: [Perform real-time ad performance analysis](https://www.risingwave.dev/docs/latest/real-time-ad-performance-analysis/)
- `cdn-metrics`: [Server performance anomaly detection](https://www.risingwave.dev/docs/latest/server-performance-anomaly-detection/)
- `clickstream`: [Clickstream analysis](https://www.risingwave.dev/docs/latest/clickstream-analysis/)
- `twitter`: [Fast Twitter events processing](https://www.risingwave.dev/docs/latest/fast-twitter-events-processing/)
- `twitter-pulsar`: [Tutorial: Pulsar + RisingWave for Fast Twitter Event Processing](https://www.risingwave.com/blog/tutorial-pulsar-risingwave-for-fast-twitter-events-processing/)
- `live-stream`: [Live stream metrics analysis](https://www.risingwave.dev/docs/latest/live-stream-metrics-analysis/)

## Demo Runnability Testing

The demos listed above will all run through a series of tests when each PR is merged, including:

- Run the queries mentioned in the demos.
- Ingest the data in various formats, including Protobuf, Avro, and JSON. Each format will be tested individually.
- For each demo test, we check if the sources and MVs have successfully ingested data, meaning that they should have >0 records.

## Workload Generator

The workloads presented in the demos are produced by a golang program in `/datagen`. You can get this tool in multiple ways:

- Download pre-built binaries from [Releases](https://github.com/risingwavelabs/risingwave-demo/releases)
- Pull the latest docker image via `docker pull ghcr.io/risingwavelabs/demo-datagen:v1.0.9`.
- Build the binary from source:
```sh
cd datagen && go build
```

To use this tool, you can run the following command:

```sh
./datagen --mode clickstream --qps 10 kafka --brokers 127.0.0.1:57801
```

or

```sh
./datagen --mode ecommerce --qps 10000000 postgres --port 6875 --user materialize --db materialize
```

- `--mode clickstream` indicates that it will produce random clickstream data.
- `--qps 10` sets a QPS limit to 10.
- `kafka | postgres` chooses the destination. For kafka, you will need to specify the brokers.
13 changes: 13 additions & 0 deletions integration_tests/ad-click/create_mv.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- The number of clicks on the ad within one minute after the ad was shown.
create materialized view m_click_statistic as
select
count(user_id) as clicks_count,
ad_id
from
ad_source
where
click_timestamp is not null
and impression_timestamp < click_timestamp
and impression_timestamp + interval '1' minute >= click_timestamp
group by
ad_id;
13 changes: 13 additions & 0 deletions integration_tests/ad-click/create_source.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- impression_timestamp: The time when the ad was shown.
-- click_timestamp: The time when the ad was clicked.
create source ad_source (
user_id bigint,
ad_id bigint,
click_timestamp timestamptz,
impression_timestamp timestamptz
) with (
connector = 'kafka',
topic = 'ad_clicks',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) row format json;
1 change: 1 addition & 0 deletions integration_tests/ad-click/data_check
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ad_source,m_click_statistic
62 changes: 62 additions & 0 deletions integration_tests/ad-click/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
version: "3"
services:
compactor-0:
extends:
file: ../../docker/docker-compose.yml
service: compactor-0
compute-node-0:
extends:
file: ../../docker/docker-compose.yml
service: compute-node-0
etcd-0:
extends:
file: ../../docker/docker-compose.yml
service: etcd-0
frontend-node-0:
extends:
file: ../../docker/docker-compose.yml
service: frontend-node-0
grafana-0:
extends:
file: ../../docker/docker-compose.yml
service: grafana-0
meta-node-0:
extends:
file: ../../docker/docker-compose.yml
service: meta-node-0
minio-0:
extends:
file: ../../docker/docker-compose.yml
service: minio-0
prometheus-0:
extends:
file: ../../docker/docker-compose.yml
service: prometheus-0
message_queue:
extends:
file: ../../docker/docker-compose.yml
service: message_queue
datagen:
build: ../datagen
depends_on: [message_queue]
command:
- /bin/sh
- -c
- /datagen --mode ad-click --qps 2 kafka --brokers message_queue:29092
restart: always
container_name: datagen
volumes:
compute-node-0:
external: false
etcd-0:
external: false
grafana-0:
external: false
minio-0:
external: false
prometheus-0:
external: false
message_queue:
external: false
name: risingwave-compose
6 changes: 6 additions & 0 deletions integration_tests/ad-click/query.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
select
*
from
m_click_statistic
limit
10;
64 changes: 64 additions & 0 deletions integration_tests/ad-ctr/create_mv.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
CREATE MATERIALIZED VIEW ad_ctr AS
SELECT
ad_clicks.ad_id AS ad_id,
ad_clicks.clicks_count :: NUMERIC / ad_impressions.impressions_count AS ctr
FROM
(
SELECT
ad_impression.ad_id AS ad_id,
COUNT(*) AS impressions_count
FROM
ad_impression
GROUP BY
ad_id
) AS ad_impressions
JOIN (
SELECT
ai.ad_id,
COUNT(*) AS clicks_count
FROM
ad_click AS ac
LEFT JOIN ad_impression AS ai ON ac.bid_id = ai.bid_id
GROUP BY
ai.ad_id
) AS ad_clicks ON ad_impressions.ad_id = ad_clicks.ad_id;

CREATE MATERIALIZED VIEW ad_ctr_5min AS
SELECT
ac.ad_id AS ad_id,
ac.clicks_count :: NUMERIC / ai.impressions_count AS ctr,
ai.window_end AS window_end
FROM
(
SELECT
ad_id,
COUNT(*) AS impressions_count,
window_end
FROM
TUMBLE(
ad_impression,
impression_timestamp,
INTERVAL '5' MINUTE
)
GROUP BY
ad_id,
window_end
) AS ai
JOIN (
SELECT
ai.ad_id,
COUNT(*) AS clicks_count,
ai.window_end AS window_end
FROM
TUMBLE(ad_click, click_timestamp, INTERVAL '5' MINUTE) AS ac
INNER JOIN TUMBLE(
ad_impression,
impression_timestamp,
INTERVAL '5' MINUTE
) AS ai ON ai.bid_id = ac.bid_id
AND ai.window_end = ac.window_end
GROUP BY
ai.ad_id,
ai.window_end
) AS ac ON ai.ad_id = ac.ad_id
AND ai.window_end = ac.window_end;
20 changes: 20 additions & 0 deletions integration_tests/ad-ctr/create_source.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
CREATE SOURCE ad_impression (
bid_id BIGINT,
ad_id BIGINT,
impression_timestamp TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'ad_impression',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) ROW FORMAT JSON;

CREATE SOURCE ad_click (
bid_id BIGINT,
click_timestamp TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'ad_click',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) ROW FORMAT JSON;
1 change: 1 addition & 0 deletions integration_tests/ad-ctr/data_check
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ad_impression,ad_click,ad_ctr,ad_ctr_5min
Loading