Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smooth render loop #1870

Closed

Conversation

fredizzimo
Copy link
Member

@fredizzimo fredizzimo commented May 18, 2023

NOTE: This works only with Neovim nightly, or the release-0.9 branch (upcoming 0.9.2 release)

This completely rewrites the current render and event loop of Neovide to make it almost completely smooth. There are some minor frame drops now and then, caused by the GPU side of Skia suddenly taking a lot longer than usual, but especially with refresh rates over 100, that's almost not noticeable. Note Skia has been updated to the latest version in PR, so it's not some bug that has been fixed later.

I was forced to add a separate thread for rendering thread on Windows because the winit event loop is not suitable for real-time rendering, for that I have reported a bug here rust-windowing/winit#2782

On Windows, I had to work around some bugs with the Windowed mode OpenGL implementation by using a separate thread that calls DwmFlush to find a suitable vsync synchronisation points. There seems to be some race conditions in the SwapBuffer implementation. It's not perfect, but it's working pretty well.

On Wayland, frame_callbacks are used for VSync.

On other platforms we assume that swap_buffers will wait for VSync. But this might not be the case on macOS, and https://developer.apple.com/documentation/corevideo/cvdisplaylinkoutputcallback might need to be implemented.

If you specify --novsync, then a software timer is used on all platforms and g:neovide_refersh_rate is used to control the refresh rate.

The redraw scheduler has also been removed and the checks if needing to redraw are now done on the fly, which is possible since the rendering and animation has been split.

This also includes the following PR
#1790, which removes the scrolling snapshots and fixes the transparency.

There are two remaining problems, both on macOS, which I can't fix myself due to not owning a mac.

  1. The transparency issues that @AsPulse was working on. I think we can fix that later.
  2. The vsync option might not work and the maximum refresh rate is used. Someone need to implement https://developer.apple.com/documentation/corevideo/cvdisplaylinkoutputcallback. Alternatively we can force it off on macOS, and then a software timer with g:neovide_refresh_rate will be used.

NOTE: It's recommended to run this with --multigrid, otherwise you wont see many changes. Furthermore there are some glitches caused by Neovim, which are fixed in the nightly builds. So it's recommended to run that. And some more fixes will be included in neovim/neovim#24425, which is now merged into Neovim master as well.

NOTE2, that this also includes #1939, and merging that first will make reviewing this much easier.

NOTE3:
On Windows, make sure you have recent Nvidia drivers 526.47 or newer. It has the setting Vulkan/OpenGL present method/Prefer layered on DXGI Swapchain, I think auto sets that, so you don't necessarily need to change it.

This reduces the random jittering considerably.

TODO:
We need to implement https://developer.apple.com/documentation/corevideo/cvdisplaylinkoutputcallback on macOS otherwise there will be no throttling with the default --vsync. --novsync with a custom g:neovide_refresh_rate works as a workaround, but not ideal.

@wroyca
Copy link

wroyca commented May 18, 2023

Oh hey, that's quite neat!

I had some time to spare, so I went ahead and built your branch to try things out. The first thing I observed is that maximizing the window and engaging in rapid scrolling on the debug build leads to substantial stuttering. On the other hand, this problem does not exist with the Release build, suggesting the presence of either a highly resource-intensive debugging process or the potential omission of valuable optimization measures:

Screencast.from.2023-05-18.14-25-03.webm

The second thing I've noticed is multiple "wl_surface@29: error" occurrence exclusively within the debug build. Specifically, we receive the following error message:

wl_surface@29: error 2: Buffer size (1940x45) must be an integer multiple of the buffer_scale (2).
warning: queue 0x7ff9800032e0 destroyed while proxies still attached:
  wl_buffer@43 still attached
  wl_buffer@46 still attached

Overall, aside from the debug build, the Release build is flawless, and it is a huge improvement over the previous render loop. I hope this PR will reach completion and be merged at some point - Thank you for your hard work! 🚀

@fredizzimo
Copy link
Member Author

@wroyca I don't have either of those problems myself and I tested with Wayland KDE Plasma on Arch Linux.

With a full screen window at 3840x1600, I have no trouble keeping stable 250 FPS even in debug mode, except for a few dropped frames here and there. And of course regular 120 FPS is really smooth, I currently have a bug where I can't run with a vsync of bigger than 120 in Wayland, so I have to use 120 rather than 144.

I have a AMD Ryzen 5 5600X, and GeForce RTX 3060, so quite powerful, but still I think normal FPS around 60 - 150 should be achievable on almost every system.

In order to test 250 FPS, I started with neovide --multigrid --novsync --noidle , then set g:neovide_refresh_rate=250 and confirmed that it's smooth with g:neovide_profiler=1. You can see when it's not able to keep up, when it's going a lot up and down, and the maximum jumps to the double of what's expected.

In release mode the limit for me is around 700-800 FPS. But even 1000 is reasonably smooth in both debug and release.

For the second issue, that looks like you might have encountered a mutter bug, https://gitlab.gnome.org/GNOME/mutter/-/issues/2603, but I'm not 100% sure. Both problems might be related.

But we can debug the slowdown through the profiling I have added.

If you run cargo run --features profiling -- multigrid --noidle and connect the tracy profiler to it https://github.com/wolfpld/tracy, you should be able to see where the time is spent. Note that the profiler itself slows down things a tiny bit. You can also use --gpu_profiling for profiling the gpu side as well, but that slows down things a bit more, and the information is not that useful, since Skia is a big black box.

@wroyca
Copy link

wroyca commented May 18, 2023

EDIT: The scrolling problem seems to occur exclusively when relativenumber is enabled.

I have a setup that closely resembles yours, featuring a 4k display and an RTX 3070TI graphics card. I'm currently running Fedora 38 with Gnome 44 so everything's up-to-date, including mutter. Since the release build runs without any issues, I don't believe that the problem stems from an external source (excluding cargo dependencies). Perhaps a winit bug? It will have to be investigated at some point. Worth mentionning, I am not running into said issues with https://github.com/Yesterday17/neovide/tree/new-keyboard-v3-dep but I am with #1789 and this PR:

image

image

connect the tracy profiler to it https://github.com/wolfpld/tracy, you should be able to see where the time is spent.

I'm interested in assisting, but I must admit my lack of expertise in compiling, setting up, and utilizing the tool in question.I have made an attempt to build it, resulting in a static library. However, I'm uncertain about the correct process to link it with Neovide? Unfortunately, the only available resource is a 93-page manual, which, given my current time constraints, I'm unable to delve into and decipher right now - I suppose it's best to create an issue once the PR is ready, so that someone familiar with Tracy can investigate the bug if they encounter it

@A-Lamia
Copy link

A-Lamia commented May 19, 2023

Holy ****, This is actually amazing dude you have worked magic on this i actually scrolled down and it was so smooth like butter.

@fredizzimo
Copy link
Member Author

@wroyca.

I found and fixed the issue with slow rendering of relative line numbers. Calling font metrics was relatively slow. It only took around 15us for each call, but it's called a lot of times. Adding some caching took care of that.

It makes sense that it happens with relative line numbers, since that forces the whole screen to be redraw each time a line is scrolled.

It's not perfect, there are still other things that are slow redrawing the text. The actual shaping is one of them, but the architecture that #1790 introduced makes it possible to decouple it completely from the actual rendering and do it in a separate thread, so we still could render smoothly while the text is being rasterized. The text would just appear on the screen with a slight delay without stutter, so that's a potential future optimisation. But there's probably other microptimisations that could be done before that too.

Anyway it seems to be able to keep 144 FPS in debug mode for me now, unless I change the font to be quite small, but even that is not too bad. In release mode it's able to keep 144 FPS with a font size of 3, but with 2 it starts to drop frames quite badly, it still looks smooth though.

@fredizzimo fredizzimo force-pushed the fsundvik/improve-render-loop branch from f412941 to 2e8375d Compare May 19, 2023 06:30
@MultisampledNight
Copy link
Contributor

@fredizzimo Do you have any local changes right now? Would you mind if I rebase this ontop of #1789?

@MultisampledNight
Copy link
Contributor

I have it locally, hit me up if you want me to push it (although it's like 5 minutes of work anyway).

@fredizzimo fredizzimo force-pushed the fsundvik/improve-render-loop branch from 2e8375d to d262cce Compare May 19, 2023 08:14
@fredizzimo
Copy link
Member Author

@MultisampledNight I just did that. Not actually rebasing on top of it, but fixing up the merge commit to merge the latest.

@fredizzimo
Copy link
Member Author

Sorry, no, I will have to do it again, I just fetched my fork, and not this repository by mistake.

@fredizzimo fredizzimo force-pushed the fsundvik/improve-render-loop branch from d262cce to 3ee862c Compare May 19, 2023 08:32
@fredizzimo
Copy link
Member Author

Ok, now it should be correct.

Copy link
Contributor

@MultisampledNight MultisampledNight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits

@@ -99,23 +105,26 @@ impl Window {
grid_size: (u64, u64),
grid_position: (f64, f64),
) {
self.grid.resize(grid_size);
// This could perhaps be optimized, setting the position does not necessarily need
// to rezize, and reset everything
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now

@@ -95,38 +91,48 @@ impl GridRenderer {
self.default_style.colors.background.unwrap().to_color()
}

// Returns a boolean with
// The cell uses the default background color
// The cell has transparency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to consider using a doc-comment if you already describe the return type, by using three slashes instead of two.

(sidenote: this is a two-element tuple of booleans)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now

let vsync_count2 = vsync_count.clone();

// When using opengl on Windows, in windowed mode, swap_buffers does not seem to be
// syncrhonized with the Desktop Window Manager. So work around that by waiting until the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now

@fredizzimo
Copy link
Member Author

I found a deadlock in Winit on Windows, so don't use this branch for anything important yet. For more information see #1789 (comment)

@wroyca
Copy link

wroyca commented May 19, 2023

@wroyca.

I found and fixed the issue with slow rendering of relative line numbers. Calling font metrics was relatively slow. It only took around 15us for each call, but it's called a lot of times. Adding some caching took care of that.

It makes sense that it happens with relative line numbers, since that forces the whole screen to be redraw each time a line is scrolled.

It's not perfect, there are still other things that are slow redrawing the text. The actual shaping is one of them, but the architecture that #1790 introduced makes it possible to decouple it completely from the actual rendering and do it in a separate thread, so we still could render smoothly while the text is being rasterized. The text would just appear on the screen with a slight delay without stutter, so that's a potential future optimisation. But there's probably other microptimisations that could be done before that too.

Anyway it seems to be able to keep 144 FPS in debug mode for me now, unless I change the font to be quite small, but even that is not too bad. In release mode it's able to keep 144 FPS with a font size of 3, but with 2 it starts to drop frames quite badly, it still looks smooth though.

Work great! I'm interested to know why this issue was explicitly prevalent in the debug version rather than the release version. What kind of compiler optimization was implemented that prevented such a significant impact ~

Regardless, I have been using the release version for several hours and now the debug version without experiencing any major crashes. However, I did observe that the multigrid feature occasionally renders the bottom split s incorrectly for a brief moment:

Screencast.from.2023-05-19.06-48-17.webm

All in all, that branch work quite nicely on Wayland! 🚀

@fredizzimo
Copy link
Member Author

@wroyca

Work great! I'm interested to know why this issue was explicitly prevalent in the debug version rather than the release version. What kind of compiler optimization was implemented that prevented such a significant impact ~

I did not check it that deeply, but my educated guess is function inlining. Each function call has a tiny overhead, but since the processing of the text lines calls so many functions, they add up to something big.

Ther's also another effect at play, once the line processing is slower than the keyboard repeat rate, things start to queue up, and makes things really bad. This can be fixed if we start using a background thread for it, since then we can cancel stuff that is already obsolete.

, I did observe that the multigrid feature occasionally renders the bottom split s incorrectly for a brief moment:

I have not seen that one, and at the moment I have no idea what's causing it, but I will try to repeat it at some point.

@fredizzimo
Copy link
Member Author

fredizzimo commented May 19, 2023

I'm writing this mostly to remind myself what to do when I return to this, but also for transparency.

I have now quickly looked at the wrongly rendered splits. There are two issues:

  1. When you split a window that is scrolled, the split will immediately have a positive scroll delta. So, we need to initialize the window with that position, rather than trying to scroll there.
  2. For some reason it takes a long time until the scrolling starts, and during that time the Window stays corrupted. It might be that only one window scrolls at a time, and therefore there's a delay until the scrolling starts.

@wroyca, I fixed the second issue, since that was easy.

@wroyca
Copy link

wroyca commented May 19, 2023

I'm writing this mostly to remind myself what to do when I return to this, but also for transparency.

I have now quickly looked at the wrongly rendered splits. There are two issues:

  1. When you split a window that is scrolled, the split will immediately have a positive scroll delta. So, we need to initialize the window with that position, rather than trying to scroll there.
  2. For some reason it takes a long time until the scrolling starts, and during that time the Window stays corrupted. It might be that only one window scrolls at a time, and therefore there's a delay until the scrolling starts.

@wroyca, I fixed the second issue, since that was easy.

Awesome, thanks! I'll let you know if I notice anything else

@Elabajaba
Copy link

Edit: On Wayland, I have two problems, the screen is way too dark, and vsync does not work. It is smooth if you set neovide_refresh_rate correctly though. --nosrgb fixes the colors, but I have no idea why, since almost everything should use srgb these days.

Nvidia+wayland+winit doesn't seem to support sRGB surfaces. Not sure how to fix it here, but with wgpu you have to use an srgb texture view to get the proper colours. (example)

@A-Lamia
Copy link

A-Lamia commented May 25, 2023

Seems there might be an issue with popup UI elements and scrolling where the borders are not static and are scrolling.

neovide_BLgS8ybr0.mp4

@fredizzimo fredizzimo force-pushed the fsundvik/improve-render-loop branch from 13b2dbf to ee28466 Compare May 25, 2023 09:09
@fredizzimo
Copy link
Member Author

fredizzimo commented May 25, 2023

I have updated the branch to use a winit version that fixes the Windows deadlocks. It should also fix the Wayland keyboard repeat in a different way than we did previously, and it's supposed to work, but I haven't tested it myself. This now points directly to the branch where the official winit development happens.

@A-Lamia. I'm not sure what to do about that. Any kind of borders, similar to the winbar issue are fundamentally incompatible with the way we are scrolling. The issue technically exists with the old snapshot-based solution as well, but it's less glitchy due to the fact that a snapshot stores the whole buffer and we always use the newest one, so it kind of fixes itself.

Telescope does not seem to suffer from this as it seems to use multiple floating windows, or something, and therefore never scrolls an area with a border.

I will bring this up with the Neovim team once they respond to my issues, and we start to discuss possible fixes and a better implementation on the Neovim side.

Edit: My old grid_scroll based solution did not have the problem either since it would have disabled smooth scrolling when there are borders. I’m personally think that solution handles more cases correctly than the current win_viewport based one. I think I will add it back with an option.

The animate function returs a boolean that tells if the animation should
continue or not.
All events are now processed before allowing rendering.
The various update functions, which are cheap to run, are always run and
return true when rendering is needed.
This properly submits everything to the GPU
The default event loop on winit, at least on Windows is too slow. See
rust-windowing/winit#2782
Windows using DwmFlush, Wayland using frame callbacks, a timer based
vsync when --novsync is given, and standard swap_frame otherwise.

Note on Windows, the there's some kind of race condition in the
opengl implementation and waiting for DwmFlush before swapping fixes that.
This is done by caching the info, instead of re-calculating each time
…imation_far_lines

Also fix the documentation
@fredizzimo
Copy link
Member Author

I rebased to the 0.11.0 release, there should be no other changes.

@fredizzimo
Copy link
Member Author

I'm closing this in favour of #1977. Please continue the discussion there.

Further fixes, and the review feedback will also be addressed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

skia-bindings version fails to build on Asahi Linux (SIGBUS in build script)