Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread out yield demo #1613

Closed
dankamongmen opened this issue May 1, 2021 · 9 comments · Fixed by #2294
Closed

thread out yield demo #1613

dankamongmen opened this issue May 1, 2021 · 9 comments · Fixed by #2294
Assignees
Labels
demo relevant to notcurses-demo enhancement New feature or request perf sweet sweet perf
Milestone

Comments

@dankamongmen
Copy link
Owner

following recent speedups in xray by multithreading (one renders while the other writes), i'm thinking the same tactic ought be applied to yield, which now takes longer than any other demo in at least foot (and is over the 20s mark in a full screen on most emulators). it ought be easy enough using the exact same synchronization structure.

i hate adding something to the 2.3.0 list on the day it's supposed to be released, but fuck it.

@dankamongmen dankamongmen added enhancement New feature or request perf sweet sweet perf demo relevant to notcurses-demo labels May 1, 2021
@dankamongmen dankamongmen added this to the 2.3.0 milestone May 1, 2021
@dankamongmen dankamongmen self-assigned this May 1, 2021
@dankamongmen dankamongmen modified the milestones: 2.3.0, 3.0.0 May 2, 2021
@dankamongmen
Copy link
Owner Author

hrmmm, so xray is indeed threaded currently...though we used to spend something like ~98% of our time writing, and now it's 3%. and removing one of the threads only cuts our FPS from 20 to 17. what happened? is it that compression now takes a majority of our total time? if so, maybe we need to change our locking up? i think it covers the entirety of sprixel creation now; we could restrict it to decoding...

@dankamongmen
Copy link
Owner Author

so simply moving the ncvisual_blit() out of the lock breaks, since both threads are operating on the same ncvisual. we'd either need move locking all the way down into the stack, or maybe double up on the ncvisual?

@dankamongmen
Copy link
Owner Author

yeah, with two threads i pumped 20FPS up to 28 (40% increase) at the cost of chewing two cores.

@dankamongmen
Copy link
Owner Author

so the easy way to do this just doesn't work; one thread's blitting the ncvisual while the other's rendering, which involves of course reading said ncvisual. you can't shift it the other way, either; you then have one editing the ncvisual while the other's blitting it. we'd have to keep some common RGB area, and rebuild the ncvisual each time. it might not be worth it. let's see.

@dankamongmen
Copy link
Owner Author

all the data copying involved ended up slowing things down slightly, less than 10% but noticeable. let me think about it for a minute and make sure there's no way we can't do this with just an ncvisual and no external copy.

ok, after removing debugging cruft, it's very slightly faster. i will keep looking. i doubt i'll merge this as it stands.

@dankamongmen
Copy link
Owner Author

also, while this works in kitty, it appears to crash in sixel. i think we're done here.

@dankamongmen
Copy link
Owner Author

so yeah, after sleeping on it, it comes down to:

  • we have to do three things: update the ncvisual, blit the ncvisual, and render/raster
  • you can't update the ncvisual while blitting the ncvisual
  • you can't blit the ncvisual while rendering/rasterizing
  • a dependency flows from one to the next through the ncvisual

so a twofer running the same code won't work. what about a feeder/eater? nah. the minimal data we could exchange would be the points at which we do a polyfill, so just two words per iteration. so let's say:

  • each has their own copy of the ncvisual
  • A finds and shares its polyfill origin
  • A begins blitting and rendering/rasterizing <-> B fills and finds+shares
  • B begins blitting and rendering/rasterizing <-> A fills and finds+shares

so this would eliminate the entire pixelback copy, on both sides. i bet this could work.

@dankamongmen
Copy link
Owner Author

yeah, that works, and it looks like we get a solid speedup. let me finish this up.

@dankamongmen
Copy link
Owner Author

yep, definite improvement. and we no longer need ncvisual_data(), which we can remove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demo relevant to notcurses-demo enhancement New feature or request perf sweet sweet perf
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant