Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: test S3-compatible storage RW claims to support #8266

Closed
lmatz opened this issue Mar 1, 2023 · 6 comments
Closed

test: test S3-compatible storage RW claims to support #8266

lmatz opened this issue Mar 1, 2023 · 6 comments
Assignees
Labels
component/test Test related issue.
Milestone

Comments

@lmatz
Copy link
Contributor

lmatz commented Mar 1, 2023

per PR, main-cron, or release testing?

I prefer main-cron and have no strong opinion, but if there is any difficulty, feel free to choose the rest.

  • cos
  • oss
  • webhdfs
  • hdfs
  • gcs
  • lyvecloud
  • minio
  • ......

We may prioritize several important ones, and finish the rest whenever there is a need.

test all three components that rely on S3-compatible object store:

  1. S3 source
  2. Hummock
  3. java connector node
  4. ......
@github-actions github-actions bot added this to the release-0.1.18 milestone Mar 1, 2023
@lmatz lmatz added the component/test Test related issue. label Mar 1, 2023
@wcy-fdu
Copy link
Contributor

wcy-fdu commented Mar 1, 2023

I prefer testing external object storage (supported via OpenDAL) in release testing.
Currently hdfs/webhdfs/gcs/oss is supported via OpenDAL, whose public APIs are stable, and raw APIs may change between minor releases from time to time.

We basically use the public APIs, so I suggest we test OpenDAL in_memory_store per PR or main-cron, as it's in memory and cost is small, and test external object storage per release to save cost.

And we can test cos/lyvecloud by reusing our current testing workload, I believe it's easy to implement.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Mar 1, 2023

Any suggestion? cc @hzxa21

@xxchan
Copy link
Member

xxchan commented Mar 1, 2023

BTW, should we also test AWS S3, instead of only minio?(It seems we already have that)

@hzxa21
Copy link
Collaborator

hzxa21 commented Mar 1, 2023

I prefer testing external object storage (supported via OpenDAL) in release testing. Currently hdfs/webhdfs/gcs/oss is supported via OpenDAL, whose public APIs are stable, and raw APIs may change between minor releases from time to time.

We basically use the public APIs, so I suggest we test OpenDAL in_memory_store per PR or main-cron, as it's in memory and cost is small, and test external object storage per release to save cost.

And we can test cos/lyvecloud by reusing our current testing workload, I believe it's easy to implement.

Generally +1. We have two object store implementations currently (with rust aws client and with OpenDAL) and I don't think we need to test the functionality of rust aws client and OpenDAL frequently, but only test them on:

  1. aws-sdk-s3 and opendal version bump.
  2. longevity/perf/release tests.

Here are my thoughts on the tests we should do:

  • Manual test: test the corretness and rough performance characteristics of a remote storage when we first support it
  • Per PR: test the correctness of object store implementation on local storage with small data set
    • e2e test with small data set on MinIO and OpenDAL (with fs engine)
  • main-cron: test the correctness of object store implementation on local storage with large data set
    • e2e test with large data set on MinIO and OpenDAL (with fs engine)
  • Pipeline test: ideally we should have a testing pipeline per officially supported remote storage (AWS S3, GCS, Azure Storage, HDFS, ...). We currently only have the AWS S3 pipelines and we can start adding more on the important ones like GCS and Azure Storage.
    • Longevity test: test stability of object store implementation on remote storage
    • Perf test: test performance of object store implementation on remote storage
    • Release test: test production readiness of object store implementation on remote storage

Note that the pipeline test also applies to the whole kernel and cloud infra, not just storage so I think we better leverage the cloud test to set up these pipelines.

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Mar 2, 2023

roughly roadmap

  • add in-memory/small dataset e2e tests via OpenDAL in Ci and test per PR
  • add e2e test with large data set on MinIO and OpenDAL on main-cron
  • add e2e test for lyvecloud and cos on main-cron, which is similar with current s3
  • Test s3 compatible object storage(lyvecloud storage, cos) in longevity test every time RisingWave release a new version

probably not very urgent

  • build pipeline test on google cloud, and test stability of gcs
  • build pipeline test on ali cloud, and test stability of oss
  • build pipeline test for hdfs

@wcy-fdu
Copy link
Contributor

wcy-fdu commented May 12, 2023

Tracked here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/test Test related issue.
Projects
None yet
Development

No branches or pull requests

4 participants