Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow rendering on multiple outputs with <output> = * #26

Merged
merged 5 commits into from
Sep 16, 2022
Merged

Allow rendering on multiple outputs with <output> = * #26

merged 5 commits into from
Sep 16, 2022

Conversation

mstoeckl
Copy link
Contributor

@mstoeckl mstoeckl commented Sep 10, 2022

Example use: run the following on either a single or multi-monitor setup; the video should show on every display:

mpvpaper '*' /path/to/video

Notes:

  • It looks like no extra changes will be needed for the --auto-stop feature, because it seems auto_stop is only triggered if no outputs have displayed a frame/set halt_info.frame_ready = 1 in the last two seconds; thus as long as one monitor is visible, mpvpaper will continue running.
  • I measured the efficiency improvement with intel_gpu_top and a video drawn simultaneously on three small virtual outputs: running mpvpaper '*' video.mkv used only 20% of the hardware video decoding capacity, and total GPU power draw was 3 W ; while showing the same video with three independent copies of mpvpaper used 40% of the hardware video decoding capacity and increased power draw to 4.5W total.

As part of this process, call eglMakeCurrent without a surface
for initial EGL and mpv setup; and when performing EGL and OpenGL
operations that are directly related to an EGLSurface, call
eglMakeCurrent on that surface.
This commit also fixes an issue where a second xdg_output::done would
mistakenly destroy outputs.
@GhostNaN
Copy link
Owner

Holy crap!!!
I can't believe it, shows you how much I know about OpenGL (clueless).

You're going to have to give me a bit to sort through what you did.
But what I can say right now, it does work.

@GhostNaN
Copy link
Owner

Alright, I've looked it over...

It doesn't effect current operation for single display.
I didn't see any real difference in RAM usage and only a little in CPU.
Also no increase in GPU usage? (More on that later)
And the code looks reasonable.
But....

video drawn simultaneously

was not the case for my system.

Yes, it showed the video on all 3 monitors, but ran terribly.
I looked into it and found that the frametimes per output was like VIDEO_FPS / MONITOR_COUNT
So if the video was running at 60(~16.6ms) each monitor was displaying at more like 20(~49.8ms).

It seemed like the outputs were taking turns rendering then displaying the video.
I verified this effect by measuring the frame callback time and render time with this in frame_handle_done():

double old_tv;

static void frame_handle_done(void *data, struct wl_callback *callback, uint32_t frame_time) {
    wl_callback_destroy(callback);
    
    struct display_output *output = data;
    if (strcmp(output->name, "DP-2") == 0) {
        struct timeval tv;
        gettimeofday(&tv, NULL);
        double curr_tv = tv.tv_usec;
        printf("Frame Time: %fms  Output: %s\n", (curr_tv - old_tv) / 1000, output->name);
        old_tv  = curr_tv;
    }
...

One curiosity was that I didn't see any rise in hardware video decoding or overall usage
on my RX 6700XT like you saw with yours.
Nor did I see an increase in power usage from my GPU.

This was the issue I was trying to convey with monitors of varying resolutions.
As the video is rendered with these parms:

mpv_render_param render_params[] = {
        {MPV_RENDER_PARAM_OPENGL_FBO, &(mpv_opengl_fbo){
            .fbo = 0,
            .w = output->width * output->scale,
            .h = output->height  * output->scale,
        }},
        // Flip rendering (needed due to flipped GL coordinate system).
        {MPV_RENDER_PARAM_FLIP_Y, &(int){1}},
    };

and can't be shared easily due to differences in width and height when rendered.

Perhaps there is a way around this issue as well.
I'll try some workarounds myself and see if I find anything.
But I applaud you for getting this far.
It is as if we are 90% the way there, just falling short.

Again, excellent work otherwise.

@mstoeckl
Copy link
Contributor Author

mstoeckl commented Sep 11, 2022

video drawn simultaneously
was not the case for my system.

Yes, it showed the video on all 3 monitors, but ran terribly.
I looked into it and found that the frametimes per output was like VIDEO_FPS / MONITOR_COUNT
So if the video was running at 60(~16.6ms) each monitor was displaying at more like 20(~49.8ms).

I can reproduce this; the problem seems to be that mpv_render_context_render always blocks until a new frame is available (which generally takes 1,/VIDEO_FPS seconds). There appears to be an option (MPV_RENDER_PARAM_BLOCK_FOR_TARGET_TIME) to disable that delay (in exchange for worse synchronization between audio and video), but when I do that I get an as-yet unexplained crash in mpv Update: the following patch on top of this PR seems to work:

diff --git a/src/main.c b/src/main.c
index 05c61c0..11648ab 100644
--- a/src/main.c
+++ b/src/main.c
@@ -128,6 +128,8 @@ static void render(struct display_output *output) {
         }},
         // Flip rendering (needed due to flipped GL coordinate system).
         {MPV_RENDER_PARAM_FLIP_Y, &(int){1}},
+        {MPV_RENDER_PARAM_BLOCK_FOR_TARGET_TIME, &(int){0}},
+        {MPV_RENDER_PARAM_INVALID, NULL},
     };
 
     if (!eglMakeCurrent(egl_display, output->egl_surface, output->egl_surface, egl_context)) {

@GhostNaN
Copy link
Owner

You amaze me again!
I thought I knew the problem. I was wrong.

I was about to go on a full on rant on about..
How if the mpv context is shared, how could they all play the same frame?
And how can we just share a context between all the outputs?
But none of that seemed necessary!

CPU usage was a bit more brutal and unusual.
With worse CPU usage scaling with VAAPI compared to 3 processes of mpvpaper.
But better CPU usage scaling with software decode compared to 3 processes of mpvpaper.

At least RAM usage is A LOT better with just "*" option.
Only consuming always just about 1 mpvpaper process worth of RAM

GPU usage and power was just more brutal for the most part.
Overall GPU usage seemed worse, but is harder to nail down here so I'll consider it a wash.
GPU power scaling was unfortunately considerably worse compared to 3 processes of mpvpaper.
Some good though was there was no change in HW decode usage.

Overall, a mixed bag as far as resource usage savings go.
As it turns out, having NO block is also not great.
Because the frame callback will just then callback every time the monitor refreshes.
So if it's a 120hz panel, regardless if the video is 30 FPS, 60 FPS or whatever.
The monitor will always refresh at effectively 120 FPS (if it can render fast enough).
Effectively wasting resources re-rendering the same frame multiple times.

I have good news though, I believe this also can be fixed.
I probably going to humbled again, but I'll say my thoughts and ideas.
With the surface frame callback, the outputs share the SAME thread.
So if 1 output blocks, all other outputs will never get the chance to callback and render.
This was the issue with mpv_render_context_render().

The simplest solution for this, is to somehow limit/delay the output surface frame callback to the VIDEO_FPS.
If that's not possible, then it would have to be delayed by smartly using some form of usleep().
The last option is to somehow leverage mpv_render_context_set_update_callback()
or something similar to notify when to render the next frame for all outputs.

Sorry for the essay of information, I didn't want to leave anything out.
Lost about half my day to this already, so I'll get back to this later.

@mstoeckl
Copy link
Contributor Author

The simplest solution for this, is to somehow limit/delay the output surface frame callback to the VIDEO_FPS.
If that's not possible, then it would have to be delayed by smartly using some form of usleep().
The last option is to somehow leverage mpv_render_context_set_update_callback()
or something similar to notify when to render the next frame for all outputs.

The last option, mpv_render_context_set_update_callback, seems to be the recommended way to do it -- some of the mpv examples use it, see e.g. SDL demo. Using the callback with Wayland is definitely doable, although it will require maybe 50 extra lines of standard boilerplate code to get a main loop that can wait for events from both Wayland and mpv. I'll try to implement this when I next have time, possibly next weekend.

Before this commit, every time redraw() was called, it would block
and wait for a new frame. This works when only one output is being
drawn; but when multiple outputs are drawn, this means only one
output at a time is given the latest frame, instead of all of them.

This change makes redraw() non-blocking,† and uses mpv_render_context's
set_update_callback method to request redraws from for all outputs
when a new frame is available. This requires a) tracking the lifetime
of all frame callbacks, to ensure frames are not rendered too quickly
(before the frame callback has returned) and b) extending the main
loop so that it can be woken up via self-pipe trick after the mpv
update callback is invoked.

† eglSwapBuffers calls wl_display_roundtrip(), so technically redraw()
  ends up waiting for the compositor to respond. This can introduce
  some extra lag when drawing onto many outputs at once. It's possible
  to avoid this problem by managing the buffers onto which OpenGL
  draws ourselves and committing the output surfaces as a batch.
  However, doing so would be quite complicated.
@mstoeckl
Copy link
Contributor Author

I've updated the PR so that frame drawing is rate limited both by the wl_surface::frame callbacks and by frame update callbacks from mpv. Let me know if you find any other problems.

@GhostNaN
Copy link
Owner

After that last commit, I believe we are ready to rock.
I wouldn't of even thought about using poll() and pipe(), good thinking.

CPU and GPU usage/power is now as good as running multiple instances of mpvpaper(mpvaper³).
Although CPU usage with software decode is still better compared to mpvaper³

GPU HW decode, VRAM, and RAM is without a doubt still better than mpvaper³

Just letting you know I plan on doing a bit of code cleanup and TLC after this pull.
But nothing functionally you added will change.
I'm just not going to bog you down any further with nit picks.

Thank you for such an awesome contribution!
mpvpaper 1.3 is looking to be another great release!

@GhostNaN GhostNaN merged commit 11d42ca into GhostNaN:master Sep 16, 2022
@mstoeckl mstoeckl deleted the multi-output branch September 16, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants