-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does not work on Raspberry Pi 4 #14
Comments
You didn't actually remove the check from |
Manually removing the check still doesn't work. Now it crashes with assertion failure inside the driver. ../src/broadcom/vulkan/v3dvx_pipeline.c:86: pack_blend: Assertion `pipeline->subpass->color_count == cb_info->attachmentCount' failed. However I can't tell if this is a vkoverhead problem or a driver problem because the OS comes with an old version of the validation layer which doesn't understand several extensions and compiling it myself is a massive PITA. |
and remove the abort() this time ref #14
Should be fixed now. |
Well it starts running now but eventually crashes in test 6. Failed to allocate device memory for BO Running test 6 alone passes. Again not sure if vkoverhead or driver bug and at this point I'm not going to debug it any further. |
I am looking into this from the perspective of v3dv. The issue is not specific to any test in particular, but to accumulated memory usage over time. As far as I can see, there is an ever increasing number of BOs being allocated. At the point it fails to allocate, I am seeing almost 32K BOs allocated that take ~200MB of memory. 200MB is not too much, but I think 32K BOs might be hitting some limits for the number of BO handles we can allocate in the kernel. I'd have to confirm this. With that said, I wonder if this ever increasing BO allocation number is expected or may point to vkoverhead leaking GPU resources from the tests. FWIW, if I run vkoverhead on my Intel laptop it progresses much further, but ends up crashing too (not sure if for the same reason though). |
It'd be interesting to know what's creating so many BOs, whether it's just command stream recording or something else. I don't think vkoverhead itself creates anywhere near that many? |
We're still looking into it but vkoverhead does create some pretty large command buffers and doesn't seem to immediately release these resources after each test, so I think this is in line with the growing number of BOs we see. With that said, I think there might be some issue within the kernel side that is causing us to fail BO allocation without an obvious reason that we are trying to track down. |
I have some more info to share: First, memory requirements from vkoverhead can be quite high with some tests. Particularly, the render pass tests hit a worst case scenario for us, since they create a render pass for each draw call and record many thousands of these commands into each command buffer. For us, each render pass requires to allocate some BOs so the BO count and memory usage blows up. At the point of failure I have seen it reach between 20K and 30K BOs and up to 1.9GB of memory just for BOs. This is made worse by the fact that we also have a BO cache in the user-space driver to help with performance. The BO cache is freed if the kernel fails to allocate a BO, so this should not be adding to the problem in theory, however, in practice when we run with the cache enabled we end up failing to allocate even after freeing the BO cache completely. This is surprising because when we disable the BO cache from the start we don't run into this problem. I was, in fact, able to complete execution of vkoverhead by disabling this cache with the following environment variable:
For what is worth, we have not been able to reproduce the problem if we use an upstream kernel, so the issue may be related to Raspberry Pi's downstream kernel changes (we are investigating this). With all that said, I do have a few suggestions for vkoverhead:
|
BTW, I also observe a weird behavior with vkoverhead, when running without parameters I see that the draw_vertex test shows about 50% of the draw calls of the base draw test, which doesn't make sense since from the point of view of the driver there is no significant difference between the two. However, if I run the tests separately (using the -test parameter) both tests score similarly, which would the expected result. This is with the CPU governor set to performance to avoid CPU throttling. The same occurs for other tests, they all score better in number of draw calls when they are executed standalone with the -test parameter. UPDATE: this behavior seems specific to Raspberry Pi though, on my Intel laptop scores with -test are about the same as when running the whole suite. |
Regarding memory requirements, I'm wondering if we might want to just use smaller iteration numbers on your builds? It could be a build-time configuration thing so that different embedded devices could tune the loops a bit--maybe only iterating 500 or 1000 times instead of 10-20x those numbers. Freeing everything on exit is a bit cumbersome with the way the code is structured. Historically I've found vkoverhead leaks by checking for memory ballooning; with how fast the loops iterate, they show up pretty fast. It's intentional that vkEndCommandBuffer is always called at the end of a command buffer. If there's cases where that's not happening then I need to fix them.
I haven't seen this behavior on any other driver. |
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 1000000 BOs of 4 kB (~4 GB) and 10000 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] zmike/vkoverhead#14 Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 1000000 BOs of 4 kB (~4 GB) and 10000 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] zmike/vkoverhead#14 Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 1000000 BOs of 4 kB (~4 GB) and 10000 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] zmike/vkoverhead#14 Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 1000000 BOs of 4 kB (~4 GB) and 10000 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] zmike/vkoverhead#14 Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 1000000 BOs of 4 kB (~4 GB) and 10000 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] zmike/vkoverhead#14 Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Currently, we are using an alignment of 128 kB to insert a node, which ends up wasting memory as we perform plenty of small BOs allocations (<= 4 kB). We require that allocations are aligned to 128Kb so for any allocation smaller than that, we are wasting the difference. This implies that we cannot effectively use the whole 4 GB address space available for the GPU in the RPi 4. Currently, we can allocate up to 32000 BOs of 4 kB (~140 MB) and 3000 BOs of 400 kB (~1,3 GB). This can be quite limiting for applications that have a high memory requirement, such as vkoverhead [1]. By reducing the page alignment to 4 kB, we can allocate up to 1000000 BOs of 4 kB (~4 GB) and 10000 BOs of 400 kB (~4 GB). Moreover, by performing benchmarks, we were able to attest that reducing the page alignment to 4 kB can provide a general performance improvement in OpenGL applications (e.g. glmark2). Therefore, this patch reduces the alignment of the node allocation to 4 kB, which will allow RPi users to explore the whole 4GB virtual address space provided by the hardware. Also, this patch allow users to fully run vkoverhead in the RPi 4/5, solving the issue reported in [1]. [1] zmike/vkoverhead#14 Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Does not run on Raspberry Pi 4. Fails with
maxColorAttachments
is only 4. It should only refuse to run the tests which actually require that many attachments.The text was updated successfully, but these errors were encountered: