Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

Async glReadPixels() in HeadlessView? #477

Closed
mikemorris opened this issue Oct 3, 2014 · 10 comments
Closed

Async glReadPixels() in HeadlessView? #477

mikemorris opened this issue Oct 3, 2014 · 10 comments
Labels
archived Archived because of inactivity Node.js node-mapbox-gl-native

Comments

@mikemorris
Copy link
Contributor

Does this make sense or seem worthwhile @kkaefer?

https://www.opengl.org/discussion_boards/showthread.php/144238-asynchronous-readback-that-call-glReadPixels%28%29-with-PBO?highlight=multithread%2A

@kkaefer
Copy link
Contributor

kkaefer commented Oct 8, 2014

Yes, we should eventually switch to that since it's likely a lot faster. I've skipped implementing that since it was not immediately needed for tests back then.

@adam-mapbox
Copy link
Contributor

@adam-mapbox
Copy link
Contributor

After talking with @mikemorris , it's not clear that doing readStillImage asynchronously is a win for us. The current context where this is used is already "asynchronous" in a sense; we don't have useful work to do while we wait for the image that we're getting on that thread.

@tmpsantos
Copy link
Contributor

/sub

@mikemorris
Copy link
Contributor Author

mikemorris commented Aug 26, 2016

Tried reimplementing this over in https://github.com/mapbox/mapbox-gl-native/compare/pbo-read-pixels in hopes of alleviating the 1000ms+ GPU hang we've been seeing in glReadPixels calls in the Node.js bindings that permanently stalls all renders currently in progress, even on other processes.

This does successfully change glReadPixels to happen asynchronously, but the hang is still present - deferred to glClientWaitSync if a fence is inserted before glReadPixels, or at some other point inside HeadlessView::readStillImage if not (didn't have proper logging in place to detect precisely when).

Not sure if this is a hardware or GPU driver issue or if we're just stalling the pipeline somehow. We've been consistently reproducing this on a GRID K520 GPU running version 367.27 of the NVIDIA Linux drivers.

@tiagovignatti
Copy link
Contributor

tiagovignatti commented Aug 29, 2016

chiming in... are you also seeing the stalls on Intel hardware, @mikemorris? Do you have an easy way to trigger these stalls, so I can take a look myself?

@tiagovignatti
Copy link
Contributor

I tried to track down what's going on but it will require a bit more work. If I switch on debug in Intel Mesa driver it captures the pipeline stall with the following output:
Flushing before mapping a referenced bo.
CPU mapping a busy miptree BO stalled and took 0.126 ms.

That's definitely a problem cause if we have to flush the batchbuffer early that has implications on performance. I'm not seeing other programs dumping anything, so I have to imply that it's a result of a bad stream that mbgl is sending to the hardware.

That said I don't think that changing to use PBO will help anything in this case. Somehow the GPU is stalled on a busy buffer object and another technique of downloading pixels won't help out.

@mikemorris
Copy link
Contributor Author

@tiagovignatti Are you seeing the same ~1 second hang I'm describing above, or is this a possible cause that manifests differently on Intel hardware?

I'm actually not too familiar with the GPU pipeline - is a stall something that could break unrelated in-progress renders or drop previously queued GL calls?

The idea with switch to the async glReadPixels was to eliminate the implicit flush - it sounds like there's still a flush happening though, which I think could be from glXMakeContextCurrent at

void HeadlessView::activateContext() {
if (!glXMakeContextCurrent(xDisplay, glxPbuffer, glxPbuffer, glContext)) {
throw std::runtime_error("Switching OpenGL context failed.\n");
}
}
void HeadlessView::deactivateContext() {
if (!glXMakeContextCurrent(xDisplay, 0, 0, nullptr)) {
throw std::runtime_error("Removing OpenGL context failed.\n");
}
}

@tiagovignatti
Copy link
Contributor

I don't think this would drop/break previously GL calls. What I am seeing on Intel is subtle stalls in many subtests of mbgl-test, but none of them takes more than 0.3 ms. Still, It might be that we're stepping in the same problem, cause I'd guess that when using discrete GPUs (your case, right?) it takes longer to synchronize the states.

This bug called my attention cause I am seeing really huge stalls (of several seconds!) when running mbgl-test with software rasterizer, but that seems orthogonal to all of this.

@stale
Copy link

stale bot commented Nov 20, 2018

This issue has been automatically detected as stale because it has not had recent activity and will be archived. Thank you for your contributions.

@stale stale bot closed this as completed Nov 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
archived Archived because of inactivity Node.js node-mapbox-gl-native
Projects
None yet
Development

No branches or pull requests

5 participants