Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Plot #18

Open
wbrickner opened this issue Apr 12, 2022 · 18 comments
Open

Optimize Plot #18

wbrickner opened this issue Apr 12, 2022 · 18 comments

Comments

@wbrickner
Copy link

Hi!

Cloning

I'm trying to use the Plot and Line API.
This requires that I consume my data by value, I'd much rather keep a cached copy & provide a reference.

My plot data is fairly large (1M+ elements, sometimes 50M).

If this is not possible, can I add the API to accept something which can be .as_ref() to a &[(f64, f64)]?

I suspect the by-value nature of the API has something to do with the immediate mode design of egui... however, in cases like this it doesn't seem to actually make sense if it is driven only be design purity, ykwim? Your thoughts on this / advice would help me.

Background Work

I am running massive simulations (~hours completion time) & interactively showing the results over time is the goal. I'd like to be able to do this in the background as mentioned. It is mentioned a few times in the documentation that long-running tasks should be deferred to background tasks in an async manner. I agree. I'm lost on how to accomplish this. I can think of a few clumsy ways, but I want to know if there is a proper way to do this that makes things easy and performant in egui.

Thank you!

@wbrickner
Copy link
Author

Update: confirmed, displaying a small subset of my data (~2%, not more than a few hundred kb of floats) causes frame rate to become unusable on release + LTO-fat build, like 1-5hz.

Value structs created ahead of time and then must be cloned every frame. I don't know for sure that the cloning is the culprit, but I strongly suspect.

After some small amount of brain-using, I think it is actually exactly because of the immediate mode architecture that there is no reason for this API to take ownership of the data, or even to require Value construction.

As for background work, all I've done for now is set an Arc<Mutex<XYZ>> field on the App struct that can be directly given to spawned threads etc. Not sure if this is a good idea!

@wbrickner
Copy link
Author

Ok after reviewing the repo locally, the story is a little more complicated.

The original thinking was that because everything is immediate mode, there can never be any non-trivial borrowing patterns involved, so it would be safe to throw in lifetime bounds and references inside Line etc & it wouldn't ever get complicated later.

I think this may actually be true, but I'm unsure. This is going to depend on when and how long the ref can possibly be asked to live. I think because PreparedPlot::ui is only called from Plot::show, and the PreparedPlot is dropped there, that this only introduces the constraint that the series data referenced exists outside the closure on Plot::show, which isn't very burdensome at all, and probably aligns with basically all use cases better.

There are additional inefficiencies present here which have become obvious b.c. my data is so large. I was very surprised to come across so many (re)allocations in the hot paths of egui. Fixing this would allow more flexibility / allow for scale that is otherwise not possible.

Take for example the reconstruction of Mesh on each frame, which contains two (very big) vectors.

A solution I can think of which basically solves this problem across the board is to modify egui to use global object pools, split by type.

E.g. when a mesh is constructed, the most recently dropped Vec<u32> and Vec<Vertex> across the entire app are reused, good for cache and draws allocation costs to zero on average.

Would you accept a PR of this type?

@emilk emilk changed the title Enormous cloning, background work? Optimize Plot Apr 13, 2022
@emilk
Copy link
Owner

emilk commented Apr 13, 2022

Before making assumptions about what is the slow part of the code, run an profiler and find out! You may be surprised.

Each point in a line that the Plot has to show runs through several steps: transform to screen space, tessellated as a line to a Mesh (where each point becomes 3-4 vertices, due to feathering), and then uploaded to the GPU. So each point goes through quite a bit of processing, and copying the data is likely not a huge part of it. Still, clones aren't cheap, and every bit of performance matter.

If you make an optimization PR, make sure you create an e2e benchmark first (in egui_demo_lib/benches/benchmark.rs) so we know how much difference it makes.


If you are showing 50M points in one plot at the time (and that are all within the view bounds) you will never get great performance. You should consider down-sampling the data first before handing it to Plot.

@emilk
Copy link
Owner

emilk commented Apr 13, 2022

To get a very rough feel for where the milliseconds are being spent you can use puffin: emilk/egui#1483 - note that this is no replacement for a REAL profiler.

@InBetweenNames
Copy link

At the very least, is it possible we could supply Plot with our own allocator? That would go some way towards this goal, but hopefully shouldn't be too intrusive on the codebase.

@emilk
Copy link
Owner

emilk commented Apr 16, 2022

You can start by trying out mimalloc and see how that affects performance

@Titaniumtown
Copy link
Contributor

You can start by trying out mimalloc and see how that affects performance

Wish that worked on wasm though :(

@EmbersArc
Copy link
Contributor

I tried to address this in emilk/egui#1816 but couldn't find a way to use non-static references. Would be great if somebody could give it a try.

@chengts95
Copy link

chengts95 commented Jul 26, 2022

We really need this, or maybe we can have some custom iterators to iterate through dense bulk arrays to avoid copying data.

@emilk
Copy link
Owner

emilk commented Jul 29, 2022

We really need this

Are you sure? Do you have a profiler dump to share?

@chengts95
Copy link

chengts95 commented Jul 30, 2022

We really need this

Are you sure? Do you have a profiler dump to share?

If I am using a plot, I will not deal with some small static data but with dynamic data containing millions of points.
I used to use implot and implot doesn't copy. Even like that million points can be very slow. For scientific plots you have to deal with more than a million points (which is quite normal) and copying the whole array is near impossible for each frame. Downsampling is a good option but still, you need to copy a lot of data each frame. It doesn't need a profiler because it is a very slow operation. Just to avoid generating temporal data or tell me how to preserve the graph to avoid redrawing the whole plot.

@Titaniumtown
Copy link
Contributor

@chengts95 your formatting on your comment isn't correct btw

@mkalte666
Copy link

mkalte666 commented Nov 28, 2022

Sorry if this is slightly offtopic, but i ran into the "i need to plot a few million points with zoom and everything" issue as well and i have a solution that works, for my case, reasonably well now. ** in release builds anyway **
I cannot share too much context (work project :/), however i think this gist should be fine :)

https://gist.github.com/mkalte666/a4944361a80c1014d098c0adb15ec9dd

The essential idea is that every time the plot bounds are moved around, i grab from the big array resolution samples, cache that, and then create PlotPoints from the cache every redraw instead of looking at the whole data set.

My use case needs the copy, as the original data source might disappear and lifetimes are a mess otherwise anyway, however you could probably realise this with a seperatly-kept reference to the original data. Or even making the functions a trait, the cache something seperate, so it works on more or less arbitrary arrays.

You would also replace the single downsampling function i use (which very explicitly grabs absolute maximum of the sampled range) and do something proper (which, however, might take too long?) or just throw away other samples.

Anyway, this is what i use for myself for now, and it works.

I hope this helps others with the issue, and maybe can serve as inspiration for others with this issue.

@YgorSouza
Copy link
Contributor

For what it's worth, here is how I handled plotting millions of points in my app: https://gitlab.com/ygor.souza/poli/-/blob/583a3f0ba2d6b0b195db8adef9d7fb70f88eeee6/src/tool/dynamical_system_simulator.rs#L378-399

It basically samples the data vector at the desired resolution using PlotPoints::from_explicit_callback. The try_get_y closure does a binary search to find the first x value that is greater than the one PlotPoints sampled, and then does a linear interpolation with the previous point (Note: I only made two closures try_get_y and get_y to be able to use the ? operator on the first one for easier error handling).

I limited the size to 1M points in my app because I'm also running the actual simulation in the UI thread for now, and I don't want it to lock the UI for more than a couple of seconds. But if I change it to 10M, I can still scroll smoothly through the results once the simulation finishes.

@mkalte666
Copy link

Adding another comment here, as my method has evolved since then. Cant share the whole thing because i made it quite custom for the data im working with, but i can throw you my mipmap functions.

Essentially i am now taking the hit of using twice the storage, and essentially use a 1d mipmap to sample the data.
This is reasonably fast, while it allows me to have "proper" downsampling (using a single pole IIR in my case). This is important, as just averaging will crunch your data down on the y-axis due to the much flatter filter curve of the box filter.

http://www.number-none.com/product/Mipmapping,%20Part%201/index.html

Mip-mapping for graphics sometimes just uses gamma correction to combat this, but this isn't image data thats normalized between 0..1, so shrug

https://gist.github.com/mkalte666/9797eb7e0eb31cdbe208fce994d3e45b

Things you might wanna do

  • make it f64; i use single precision data a lot because its just faster for me (has nothing to do with plotting, im dong $company_math on the stuff :D)
  • make it correct; i dont care about data width, and this will discard samples. This is fine for my use case, but might not be for you. For correct mip-mapping your data needs to be 2^n samples in width.
  • make it generic? solves issue one :D
  • Call it something else but mipmap. Mipmap is what you have in 2d and i have no idea if that name applies here
  • make the lookup lerp between points, or blend with other mipmap levels. This is done for graphics; not sure if it applies here

Hope this helps people, have a good day!

@GunnarMorrigan
Copy link

GunnarMorrigan commented Jan 15, 2024

Hi

I am currently looking at optimizing the plotting of many data points. I have done profiling.

There are a few problem as I see it with the current code that absolutely destroy performance on many data points

  1. PlotUi takes ownership of the data to be plotted and shortly after drops the data as it is turned into a shape.
  2. Turning PlotItem into a shape in PlotUi does a lot of Vec::push without a reserve before hand.
  3. Tessellation of many objects into a single mesh does a lot of Vec::push with small upfront reserves but not one large reserve.

I think these items are low hanging and easy to address problems and I will add a PR for that in the near future.

  1. Plot data can be provided as Owned or mutable reference as shown here:
    https://github.com/GunnarMorrigan/egui/blob/d721ef2d90a8295af24cd720ec5908707f4c1e2c/crates/egui_plot/src/lib.rs#L1399
  2. Precalculate the required size of the Shape vec to reserve at least the required capacity.
  3. Precalculate the required size of the mesh vecs to reserve at least the required capacity.
    This is a bit harder as the exact amount of indices and vertices will depend on some run time configurations. Maybe we can always take the upper bound. Taking the upper bound can cause increase memory usage if estimate is too far above actual usage.

Additionally, the compute takes place on a single core. Maybe it can be parallelized efficiently for native instances?

@GunnarMorrigan
Copy link

GunnarMorrigan commented Jan 22, 2024

Hi all who are interested in this,

I have implemented plotting of borrowed data:
This basically allows you to pass Iterators over &Coordinate. Using this you can more easily implement any form of down sampling / filtering on stored data.

Update: My pr is closed and won't be merged. More info here: emilk/egui#3849

emilk referenced this issue in emilk/egui Feb 1, 2024
* Part of https://github.com/emilk/egui/issues/1485

This adds a `rayon` feature to `epaint` and `egui` to parallelize
tessellation of large shapes, such as high-resolution plot lines.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants