Skip to content

Commit

Permalink
chore: migrate demo projects here as intergration tests (#8502)
Browse files Browse the repository at this point in the history
Co-authored-by: TennyZhuang <zty0826@gmail.com>
  • Loading branch information
fuyufjh and TennyZhuang authored Mar 14, 2023
1 parent c11fb64 commit 4f34ade
Show file tree
Hide file tree
Showing 150 changed files with 7,326 additions and 3 deletions.
107 changes: 107 additions & 0 deletions .github/workflows/intergration_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
name: Integration Tests CI

on:
schedule:
# Currently we build docker images at 12:00 (UTC), so run this at 13:00
- cron: '0 13 * * *'

jobs:
golangci:
name: lint
runs-on: ubuntu-latest
steps:
- uses: actions/setup-go@v3
with:
go-version: 1.18
- uses: actions/checkout@v3
- name: golangci-lint
uses: golangci/golangci-lint-action@v3
with:
working-directory: integration_tests/datagen
args: --timeout=120s
- name: Go build
run: |
go mod tidy
git diff --exit-code go.mod go.sum
go build .
working-directory: integration_tests/datagen
run-demos:
strategy:
matrix:
testcase:
- ad-click
- ad-ctr
- cdn-metrics
- clickstream
- livestream
- twitter
- prometheus
- schema-registry
- mysql-cdc
- postgres-cdc
#- mysql-sink
- postgres-sink
- iceberg-sink
format: ["json", "protobuf"]
exclude:
- testcase: ad-click
format: protobuf
- testcase: ad-ctr
format: protobuf
- testcase: cdn-metrics
format: protobuf
- testcase: clickstream
format: protobuf
- testcase: prometheus
format: protobuf
# This demo is showcasing avro + schema registry. So there's no file server for the schema file.
- testcase: schema-registry
format: protobuf
- testcase: mysql-cdc
format: protobuf
- testcase: postgres-cdc
format: protobuf
- testcase: mysql-sink
format: protobuf
- testcase: postgres-sink
format: protobuf
- testcase: iceberg-sink
format: protobuf
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
with:
fetch-depth: 0

# In this step, this action saves a list of existing images,
# the cache is created without them in the post run.
# It also restores the cache if it exists.
- uses: satackey/action-docker-layer-caching@v0.0.11
# Ignore the failure of a step and avoid terminating the job.
continue-on-error: true

- name: Rewrite docker compose for protobuf
working-directory: integration_tests/scripts
if: ${{ matrix.format == 'protobuf' }}
run: |
python3 gen_pb_compose.py ${{ matrix.testcase }} ${{ matrix.format }}
- name: Run Demos
working-directory: integration_tests/scripts
run: |
python3 run_demos.py --case ${{ matrix.testcase }} --format ${{ matrix.format }}
- name: Check if the ingestion is successful
working-directory: integration_tests/scripts
run: |
python3 check_data.py ${{ matrix.testcase }}
- name: Dump logs on failure
if: ${{ failure() }}
working-directory: integration_tests/${{ matrix.testcase }}
run: |
docker compose logs
- uses: satackey/action-docker-layer-caching@v0.0.11
continue-on-error: true
2 changes: 1 addition & 1 deletion .github/workflows/typo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ jobs:
uses: actions/checkout@v3

- name: Check spelling of the entire repository
uses: crate-ci/typos@v1.11.1
uses: crate-ci/typos@v1.13.20
4 changes: 2 additions & 2 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -327,7 +327,7 @@ services:
- "9644:9644"
depends_on: []
volumes:
- "redpanda:/var/lib/redpanda/data"
- "message_queue:/var/lib/redpanda/data"
environment: {}
container_name: message_queue
healthcheck:
Expand All @@ -348,6 +348,6 @@ volumes:
external: false
prometheus-0:
external: false
redpanda:
message_queue:
external: false
name: risingwave-compose
46 changes: 46 additions & 0 deletions integration_tests/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# RisingWave Demos

Here is a gallery of demos that present how to use RisingWave alongwith the ecosystem tools.

- `ad-click/`: [Build and Maintain Real-time Applications Faster and Easier with Redpanda and RisingWave](https://singularity-data.com/blog/build-with-Redpanda-and-RisingWave)
- `ad-ctr`: [Perform real-time ad performance analysis](https://www.risingwave.dev/docs/latest/real-time-ad-performance-analysis/)
- `cdn-metrics`: [Server performance anomaly detection](https://www.risingwave.dev/docs/latest/server-performance-anomaly-detection/)
- `clickstream`: [Clickstream analysis](https://www.risingwave.dev/docs/latest/clickstream-analysis/)
- `twitter`: [Fast Twitter events processing](https://www.risingwave.dev/docs/latest/fast-twitter-events-processing/)
- `twitter-pulsar`: [Tutorial: Pulsar + RisingWave for Fast Twitter Event Processing](https://www.risingwave.com/blog/tutorial-pulsar-risingwave-for-fast-twitter-events-processing/)
- `live-stream`: [Live stream metrics analysis](https://www.risingwave.dev/docs/latest/live-stream-metrics-analysis/)

## Demo Runnability Testing

The demos listed above will all run through a series of tests when each PR is merged, including:

- Run the queries mentioned in the demos.
- Ingest the data in various formats, including Protobuf, Avro, and JSON. Each format will be tested individually.
- For each demo test, we check if the sources and MVs have successfully ingested data, meaning that they should have >0 records.

## Workload Generator

The workloads presented in the demos are produced by a golang program in `/datagen`. You can get this tool in multiple ways:

- Download pre-built binaries from [Releases](https://github.com/risingwavelabs/risingwave-demo/releases)
- Pull the latest docker image via `docker pull ghcr.io/risingwavelabs/demo-datagen:v1.0.9`.
- Build the binary from source:
```sh
cd datagen && go build
```

To use this tool, you can run the following command:

```sh
./datagen --mode clickstream --qps 10 kafka --brokers 127.0.0.1:57801
```

or

```sh
./datagen --mode ecommerce --qps 10000000 postgres --port 6875 --user materialize --db materialize
```

- `--mode clickstream` indicates that it will produce random clickstream data.
- `--qps 10` sets a QPS limit to 10.
- `kafka | postgres` chooses the destination. For kafka, you will need to specify the brokers.
13 changes: 13 additions & 0 deletions integration_tests/ad-click/create_mv.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- The number of clicks on the ad within one minute after the ad was shown.
create materialized view m_click_statistic as
select
count(user_id) as clicks_count,
ad_id
from
ad_source
where
click_timestamp is not null
and impression_timestamp < click_timestamp
and impression_timestamp + interval '1' minute >= click_timestamp
group by
ad_id;
13 changes: 13 additions & 0 deletions integration_tests/ad-click/create_source.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- impression_timestamp: The time when the ad was shown.
-- click_timestamp: The time when the ad was clicked.
create source ad_source (
user_id bigint,
ad_id bigint,
click_timestamp timestamptz,
impression_timestamp timestamptz
) with (
connector = 'kafka',
topic = 'ad_clicks',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) row format json;
1 change: 1 addition & 0 deletions integration_tests/ad-click/data_check
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ad_source,m_click_statistic
62 changes: 62 additions & 0 deletions integration_tests/ad-click/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
version: "3"
services:
compactor-0:
extends:
file: ../../docker/docker-compose.yml
service: compactor-0
compute-node-0:
extends:
file: ../../docker/docker-compose.yml
service: compute-node-0
etcd-0:
extends:
file: ../../docker/docker-compose.yml
service: etcd-0
frontend-node-0:
extends:
file: ../../docker/docker-compose.yml
service: frontend-node-0
grafana-0:
extends:
file: ../../docker/docker-compose.yml
service: grafana-0
meta-node-0:
extends:
file: ../../docker/docker-compose.yml
service: meta-node-0
minio-0:
extends:
file: ../../docker/docker-compose.yml
service: minio-0
prometheus-0:
extends:
file: ../../docker/docker-compose.yml
service: prometheus-0
message_queue:
extends:
file: ../../docker/docker-compose.yml
service: message_queue
datagen:
build: ../datagen
depends_on: [message_queue]
command:
- /bin/sh
- -c
- /datagen --mode ad-click --qps 2 kafka --brokers message_queue:29092
restart: always
container_name: datagen
volumes:
compute-node-0:
external: false
etcd-0:
external: false
grafana-0:
external: false
minio-0:
external: false
prometheus-0:
external: false
message_queue:
external: false
name: risingwave-compose
6 changes: 6 additions & 0 deletions integration_tests/ad-click/query.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
select
*
from
m_click_statistic
limit
10;
64 changes: 64 additions & 0 deletions integration_tests/ad-ctr/create_mv.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
CREATE MATERIALIZED VIEW ad_ctr AS
SELECT
ad_clicks.ad_id AS ad_id,
ad_clicks.clicks_count :: NUMERIC / ad_impressions.impressions_count AS ctr
FROM
(
SELECT
ad_impression.ad_id AS ad_id,
COUNT(*) AS impressions_count
FROM
ad_impression
GROUP BY
ad_id
) AS ad_impressions
JOIN (
SELECT
ai.ad_id,
COUNT(*) AS clicks_count
FROM
ad_click AS ac
LEFT JOIN ad_impression AS ai ON ac.bid_id = ai.bid_id
GROUP BY
ai.ad_id
) AS ad_clicks ON ad_impressions.ad_id = ad_clicks.ad_id;

CREATE MATERIALIZED VIEW ad_ctr_5min AS
SELECT
ac.ad_id AS ad_id,
ac.clicks_count :: NUMERIC / ai.impressions_count AS ctr,
ai.window_end AS window_end
FROM
(
SELECT
ad_id,
COUNT(*) AS impressions_count,
window_end
FROM
TUMBLE(
ad_impression,
impression_timestamp,
INTERVAL '5' MINUTE
)
GROUP BY
ad_id,
window_end
) AS ai
JOIN (
SELECT
ai.ad_id,
COUNT(*) AS clicks_count,
ai.window_end AS window_end
FROM
TUMBLE(ad_click, click_timestamp, INTERVAL '5' MINUTE) AS ac
INNER JOIN TUMBLE(
ad_impression,
impression_timestamp,
INTERVAL '5' MINUTE
) AS ai ON ai.bid_id = ac.bid_id
AND ai.window_end = ac.window_end
GROUP BY
ai.ad_id,
ai.window_end
) AS ac ON ai.ad_id = ac.ad_id
AND ai.window_end = ac.window_end;
20 changes: 20 additions & 0 deletions integration_tests/ad-ctr/create_source.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
CREATE SOURCE ad_impression (
bid_id BIGINT,
ad_id BIGINT,
impression_timestamp TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'ad_impression',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) ROW FORMAT JSON;

CREATE SOURCE ad_click (
bid_id BIGINT,
click_timestamp TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'ad_click',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) ROW FORMAT JSON;
1 change: 1 addition & 0 deletions integration_tests/ad-ctr/data_check
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ad_impression,ad_click,ad_ctr,ad_ctr_5min
Loading

0 comments on commit 4f34ade

Please sign in to comment.