I'm looking for any documentation, articles, guides, or posts about:
The context:
I'm contemplating making a fairly GPU rendering and memory bandwidth intensive application. Relatively intensive for a low-power SBC, i.e. I don't expect it to be anywhere near even an integrated GPU, and mostly memory, not compute/rendering, limited.
The app consists of rendering a trivial scene multiple times into a large framebuffer, and then postprocessing that buffer in a nonlocal fashion.
Preliminary tests show that the rendering part is mostly fine -- it's not 60fps, but enough for the purposes.
But the framebuffer postprocessing part is a 5fps disaster. Even just reading an empty (glClear, no rendering at all) framebuffer is already too slow.
Note that I'm already doing exclusive libdrm+KMS thing, i.e. there's no usual x11/wayland compositing tax.
I wonder if there are known ways to make progress from here. E.g. any specific GL extensions, tile-local processing, or maybe just switching to Vulkan to have more control over passes/subpasses, image layout transitions, etc.
- Performance characteristics of VideoCore VI / v3d. Memory bandwidth, flops, etc.
The only numbers I could find are: measured CPU mem bandwidth (~4.6GiB/s), max theoretical pixel fillrate (2.4GPixels/s), and computed 32GFLOPS. (see viewtopic.php?t=244519) - GPU performance guides. Do's and Dont's.
There are a few articles about other tiled GPUs, but it still would be nice to have something explicitly about VideoCore. - GPU profiling guides. What to measure, how to measure, when to measure. How to analyze and interpret results.
There's almost a hundred of GPU counters exposed by v3d in GALLIUM_HUD (supposedly available via GL extensions also). But what do they all mean?
Some of them are probably the same as in the official VC4 Reference guide https://docs.broadcom.com/doc/12358545 but others seem to be completely new. - Any other VC6-relevant tricks, techniques, etc. Maybe related to Tile-Based Deferred Rendering (TBDR).
The context:
I'm contemplating making a fairly GPU rendering and memory bandwidth intensive application. Relatively intensive for a low-power SBC, i.e. I don't expect it to be anywhere near even an integrated GPU, and mostly memory, not compute/rendering, limited.
The app consists of rendering a trivial scene multiple times into a large framebuffer, and then postprocessing that buffer in a nonlocal fashion.
Preliminary tests show that the rendering part is mostly fine -- it's not 60fps, but enough for the purposes.
But the framebuffer postprocessing part is a 5fps disaster. Even just reading an empty (glClear, no rendering at all) framebuffer is already too slow.
Note that I'm already doing exclusive libdrm+KMS thing, i.e. there's no usual x11/wayland compositing tax.
I wonder if there are known ways to make progress from here. E.g. any specific GL extensions, tile-local processing, or maybe just switching to Vulkan to have more control over passes/subpasses, image layout transitions, etc.
Statistics: Posted by provod — Wed Sep 18, 2024 6:30 pm — Replies 2 — Views 58