-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Report logging benchmarks for C++/Python/Rust to CI #4100
Comments
Very similar to: |
### What * Part of #4100 Implements SDK sided logging benchmark for C++ & Rust. Kept as simple as possible, meant for whole-process profiling so we capture all sideeffects. This makes tthe data generation ('prepare') inside the benchmark apps ofc quite tricky as it has to be as fast as possible. Additionally, both Rust & Cpp app expose a way to get more fine grained logging. Cpp does this via a simple profiler scope, Rust via Puffin/re_tracing. Logging always happens to a memory recording. Data is currently never passed in in the Rerun format Contains the tree initial benchmarks we wanted to have: * points3d_large_batch * Single batch of 50mio points (color, position, radius, single label) * points3d_many_individual * 1mio individual points with different time stamp each (color, position, radius) * image * log 4 different 16k x 16k RGBA8 images (4GiB of memory!) Running instructions in `main.rs` & `main.cpp`! Timings on my M1 Max in seconds (tests are not perfectly comparable, they do not do the exact same thing. Prepare times are also slightly different and most significant in the _large_batch test) * points3d_large_batch * C++: 0.94s * Rust: 1.34s * points3d_many_individual * C++: 16.86s (⚠️ there's almost certainly some involuntary allocation going on there) * Rust: 2.75s * image * C++: 3.11s * Rust: 1.10s Missing * Python version * utility script for building, running and publishing data ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [demo.rerun.io](https://demo.rerun.io/pr/4181) (if applicable) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4181) - [Docs preview](https://rerun.io/preview/73a3736ac3c0be33fa8d6e6b40a2af243c4aa2d9/docs) <!--DOCS-PREVIEW--> - [Examples preview](https://rerun.io/preview/73a3736ac3c0be33fa8d6e6b40a2af243c4aa2d9/examples) <!--EXAMPLES-PREVIEW--> - [Recent benchmark results](https://ref.rerun.io/dev/bench/) - [Wasm size tracking](https://ref.rerun.io/dev/sizes/)
### What This PR adds the 3 basic benchmarks for Python. One notable different w.r.t the rust version is that the benchmarks use a single recording, but create a fresh memory sink for each iteration (the rust version creates a fresh recording for each iteration as well). This is due to #4410. To run: ``` just py-bench ``` **IMPORTANT**: the python version of `many_individual` runs 100k points instead of 1M for the other benchmarks! * Part of #4100 On my machine: <img width="1590" alt="image" src="https://github.com/rerun-io/rerun/assets/49431240/99a74354-aa09-4267-a0fa-6587ecd9f8e5"> ### Checklist * [x] I have read and agree to [Contributor Guide](https://github.com/rerun-io/rerun/blob/main/CONTRIBUTING.md) and the [Code of Conduct](https://github.com/rerun-io/rerun/blob/main/CODE_OF_CONDUCT.md) * [x] I've included a screenshot or gif (if applicable) * [x] I have tested [app.rerun.io](https://app.rerun.io/pr/4411) (if applicable) * [x] The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG - [PR Build Summary](https://build.rerun.io/pr/4411) - [Docs preview](https://rerun.io/preview/0f8403061c76b2147bebef25cfebd1b0c5e47c73/docs) <!--DOCS-PREVIEW--> - [Examples preview](https://rerun.io/preview/0f8403061c76b2147bebef25cfebd1b0c5e47c73/examples) <!--EXAMPLES-PREVIEW--> - [Recent benchmark results](https://build.rerun.io/graphs/crates.html) - [Wasm size tracking](https://build.rerun.io/graphs/sizes.html)
We have now benchmarks for Python/Rust/C++ but we still don't upload the results |
For reference, here are our performance targets: |
We also want to explicitly benchmark logging scalars, including setting a timeline value for each logged scalar, i.e. something like for frame_nr in range(0, 1_000_000) {
rr. set_time_sequence("frame", frame_nr)
rr.log("scalar", rr.TimeSeriesScalar(sin(frame_nr / 1000.0)))
} |
|
We also want to check the memory use in the viewer when we have logged 100M scalars or so, to measure the RAM overhead. |
This is closed when we have an easy way to run benchmarks for all languages, and those results are published (perhaps manually) somewhere publicly. |
Keep things super simple and do end-to-end profing: Profile running a benchmark binary that internally has a bunch of test cases. This way we can integrate results from all benchmarks in the same way into our CI generated benchmark stats.
We execute it with different parameters to check for different test cases.
Basic set of test cases we should start with:
In all cases (unless configured otherwise) log to a memory recording. (profiling other parts of the flow should be part of a different Rust benchmark)
Since we want to simply time process from spawn to end must make sure that data generation is super fast. Maybe print out additional timings in each language where appropriate - this is harder to integrate into CI graphs, but nice for debugging.
Ideally same data on all SDKs. There might be variations though in logging flow that don't map to each of them.
The text was updated successfully, but these errors were encountered: