Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: migrate demo projects here as intergration tests #8502

Merged
merged 13 commits into from
Mar 14, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Migrate demo project here as integration tests
  • Loading branch information
fuyufjh committed Mar 13, 2023
commit 40337c52fbf578c8a7f31d5efc2cd3c797b472b5
46 changes: 46 additions & 0 deletions integration_test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# RisingWave Demos

Here is a gallery of demos that present how to use RisingWave alongwith the ecosystem tools.

- `ad-click/`: [Build and Maintain Real-time Applications Faster and Easier with Redpanda and RisingWave](https://singularity-data.com/blog/build-with-Redpanda-and-RisingWave)
- `ad-ctr`: [Perform real-time ad performance analysis](https://www.risingwave.dev/docs/latest/real-time-ad-performance-analysis/)
- `cdn-metrics`: [Server performance anomaly detection](https://www.risingwave.dev/docs/latest/server-performance-anomaly-detection/)
- `clickstream`: [Clickstream analysis](https://www.risingwave.dev/docs/latest/clickstream-analysis/)
- `twitter`: [Fast Twitter events processing](https://www.risingwave.dev/docs/latest/fast-twitter-events-processing/)
- `twitter-pulsar`: [Tutorial: Pulsar + RisingWave for Fast Twitter Event Processing](https://www.risingwave.com/blog/tutorial-pulsar-risingwave-for-fast-twitter-events-processing/)
- `live-stream`: [Live stream metrics analysis](https://www.risingwave.dev/docs/latest/live-stream-metrics-analysis/)

## Demo Runnability Testing

The demos listed above will all run through a series of tests when each PR is merged, including:

- Run the queries mentioned in the demos.
- Ingest the data in various formats, including Protobuf, Avro, and JSON. Each format will be tested individually.
- For each demo test, we check if the sources and MVs have successfully ingested data, meaning that they should have >0 records.

## Workload Generator

The workloads presented in the demos are produced by a golang program in `/datagen`. You can get this tool in multiple ways:

- Download pre-built binaries from [Releases](https://github.com/risingwavelabs/risingwave-demo/releases)
- Pull the latest docker image via `docker pull ghcr.io/risingwavelabs/demo-datagen:v1.0.9`.
- Build the binary from source:
```sh
cd datagen && go build
```

To use this tool, you can run the following command:

```sh
./datagen --mode clickstream --qps 10 kafka --brokers 127.0.0.1:57801
```

or

```sh
./datagen --mode ecommerce --qps 10000000 postgres --port 6875 --user materialize --db materialize
```

- `--mode clickstream` indicates that it will produce random clickstream data.
- `--qps 10` sets a QPS limit to 10.
- `kafka | postgres` chooses the destination. For kafka, you will need to specify the brokers.
13 changes: 13 additions & 0 deletions integration_test/ad-click/create_mv.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- The number of clicks on the ad within one minute after the ad was shown.
create materialized view m_click_statistic as
select
count(user_id) as clicks_count,
ad_id
from
ad_source
where
click_timestamp is not null
and impression_timestamp < click_timestamp
and impression_timestamp + interval '1' minute >= click_timestamp
group by
ad_id;
13 changes: 13 additions & 0 deletions integration_test/ad-click/create_source.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- impression_timestamp: The time when the ad was shown.
-- click_timestamp: The time when the ad was clicked.
create source ad_source (
user_id bigint,
ad_id bigint,
click_timestamp timestamptz,
impression_timestamp timestamptz
) with (
connector = 'kafka',
topic = 'ad_clicks',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) row format json;
1 change: 1 addition & 0 deletions integration_test/ad-click/data_check
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ad_source,m_click_statistic
60 changes: 60 additions & 0 deletions integration_test/ad-click/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
version: "3"
services:
compactor-0:
extends:
file: ../../docker/docker-compose.yml
service: compactor-0
compute-node-0:
extends:
file: ../../docker/docker-compose.yml
service: compute-node-0
etcd-0:
extends:
file: ../../docker/docker-compose.yml
service: etcd-0
frontend-node-0:
extends:
file: ../../docker/docker-compose.yml
service: frontend-node-0
grafana-0:
extends:
file: ../../docker/docker-compose.yml
service: grafana-0
meta-node-0:
extends:
file: ../../docker/docker-compose.yml
service: meta-node-0
minio-0:
extends:
file: ../../docker/docker-compose.yml
service: minio-0
prometheus-0:
extends:
file: ../../docker/docker-compose.yml
service: prometheus-0
message_queue:
extends:
file: ../../docker/docker-compose.yml
service: message_queue
datagen:
image: ghcr.io/risingwavelabs/demo-datagen:v1.0.9
depends_on: [message_queue]
command:
- /bin/sh
- -c
- /datagen --mode ad-click --qps 2 kafka --brokers message_queue:29092
restart: always
container_name: datagen
volumes:
compute-node-0:
external: false
etcd-0:
external: false
grafana-0:
external: false
minio-0:
external: false
prometheus-0:
external: false
name: risingwave-compose
6 changes: 6 additions & 0 deletions integration_test/ad-click/query.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
select
*
from
m_click_statistic
limit
10;
64 changes: 64 additions & 0 deletions integration_test/ad-ctr/create_mv.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
CREATE MATERIALIZED VIEW ad_ctr AS
SELECT
ad_clicks.ad_id AS ad_id,
ad_clicks.clicks_count :: NUMERIC / ad_impressions.impressions_count AS ctr
FROM
(
SELECT
ad_impression.ad_id AS ad_id,
COUNT(*) AS impressions_count
FROM
ad_impression
GROUP BY
ad_id
) AS ad_impressions
JOIN (
SELECT
ai.ad_id,
COUNT(*) AS clicks_count
FROM
ad_click AS ac
LEFT JOIN ad_impression AS ai ON ac.bid_id = ai.bid_id
GROUP BY
ai.ad_id
) AS ad_clicks ON ad_impressions.ad_id = ad_clicks.ad_id;

CREATE MATERIALIZED VIEW ad_ctr_5min AS
SELECT
ac.ad_id AS ad_id,
ac.clicks_count :: NUMERIC / ai.impressions_count AS ctr,
ai.window_end AS window_end
FROM
(
SELECT
ad_id,
COUNT(*) AS impressions_count,
window_end
FROM
TUMBLE(
ad_impression,
impression_timestamp,
INTERVAL '5' MINUTE
)
GROUP BY
ad_id,
window_end
) AS ai
JOIN (
SELECT
ai.ad_id,
COUNT(*) AS clicks_count,
ai.window_end AS window_end
FROM
TUMBLE(ad_click, click_timestamp, INTERVAL '5' MINUTE) AS ac
INNER JOIN TUMBLE(
ad_impression,
impression_timestamp,
INTERVAL '5' MINUTE
) AS ai ON ai.bid_id = ac.bid_id
AND ai.window_end = ac.window_end
GROUP BY
ai.ad_id,
ai.window_end
) AS ac ON ai.ad_id = ac.ad_id
AND ai.window_end = ac.window_end;
20 changes: 20 additions & 0 deletions integration_test/ad-ctr/create_source.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
CREATE SOURCE ad_impression (
bid_id BIGINT,
ad_id BIGINT,
impression_timestamp TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'ad_impression',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) ROW FORMAT JSON;

CREATE SOURCE ad_click (
bid_id BIGINT,
click_timestamp TIMESTAMPTZ
) WITH (
connector = 'kafka',
topic = 'ad_click',
properties.bootstrap.server = 'message_queue:29092',
scan.startup.mode = 'earliest'
) ROW FORMAT JSON;
1 change: 1 addition & 0 deletions integration_test/ad-ctr/data_check
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
ad_impression,ad_click,ad_ctr,ad_ctr_5min
60 changes: 60 additions & 0 deletions integration_test/ad-ctr/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
version: "3"
services:
compactor-0:
extends:
file: ../../docker/docker-compose.yml
service: compactor-0
compute-node-0:
extends:
file: ../../docker/docker-compose.yml
service: compute-node-0
etcd-0:
extends:
file: ../../docker/docker-compose.yml
service: etcd-0
frontend-node-0:
extends:
file: ../../docker/docker-compose.yml
service: frontend-node-0
grafana-0:
extends:
file: ../../docker/docker-compose.yml
service: grafana-0
meta-node-0:
extends:
file: ../../docker/docker-compose.yml
service: meta-node-0
minio-0:
extends:
file: ../../docker/docker-compose.yml
service: minio-0
prometheus-0:
extends:
file: ../../docker/docker-compose.yml
service: prometheus-0
message_queue:
extends:
file: ../../docker/docker-compose.yml
service: message_queue
datagen:
image: ghcr.io/risingwavelabs/demo-datagen:v1.0.9
depends_on: [message_queue]
command:
- /bin/sh
- -c
- /datagen --mode ad-ctr --qps 2 kafka --brokers message_queue:29092
restart: always
container_name: datagen
volumes:
compute-node-0:
external: false
etcd-0:
external: false
grafana-0:
external: false
minio-0:
external: false
prometheus-0:
external: false
name: risingwave-compose
6 changes: 6 additions & 0 deletions integration_test/ad-ctr/query.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
SELECT
*
FROM
ad_ctr_5min
limit
10;
79 changes: 79 additions & 0 deletions integration_test/cdn-metrics/create_mv.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
CREATE MATERIALIZED VIEW high_util_tcp_metrics AS
SELECT
tcp.device_id AS device_id,
tcp.window_end AS window_end,
tcp.metric_name AS metric_name,
tcp.metric_value AS metric_value,
nic.avg_util AS tcp_avg_bandwidth_util
FROM
(
SELECT
device_id,
window_end,
metric_name,
AVG(metric_value) AS metric_value
FROM
TUMBLE(
tcp_metrics,
report_time,
INTERVAL '1' MINUTE
)
GROUP BY
device_id,
window_end,
metric_name
) AS tcp
JOIN (
SELECT
device_id,
window_end,
AVG((metric_value) / bandwidth) * 100 AS avg_util
FROM
TUMBLE(
nics_metrics,
report_time,
INTERVAL '1' MINUTE
)
WHERE
metric_name = 'tx_bytes'
AND aggregation = 'avg'
GROUP BY
device_id,
window_end
) AS nic ON tcp.device_id = nic.device_id
AND tcp.window_end = nic.window_end
WHERE
avg_util >= 50;

CREATE MATERIALIZED VIEW retrans_incidents AS
SELECT
device_id,
window_end AS trigger_time,
metric_value AS trigger_value
FROM
high_util_tcp_metrics
WHERE
metric_name = 'retrans_rate'
AND metric_value > 0.15;

CREATE MATERIALIZED VIEW srtt_incidents AS
SELECT
device_id,
window_end AS trigger_time,
metric_value AS trigger_value
FROM
high_util_tcp_metrics
WHERE
metric_name = 'srtt'
AND metric_value > 500.0;

CREATE MATERIALIZED VIEW download_incidents AS
SELECT
device_id,
window_end AS trigger_time,
metric_value AS trigger_value
FROM
high_util_tcp_metrics
WHERE
metric_name = 'download_speed'
AND metric_value < 200.0;
Loading