Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 4487

Graphics programming • Raspberry Pi 4 GPU performance and profiling

$
0
0
I'm looking for any documentation, articles, guides, or posts about:
  • Performance characteristics of VideoCore VI / v3d. Memory bandwidth, flops, etc.
    The only numbers I could find are: measured CPU mem bandwidth (~4.6GiB/s), max theoretical pixel fillrate (2.4GPixels/s), and computed 32GFLOPS. (see viewtopic.php?t=244519)
  • GPU performance guides. Do's and Dont's.
    There are a few articles about other tiled GPUs, but it still would be nice to have something explicitly about VideoCore.
  • GPU profiling guides. What to measure, how to measure, when to measure. How to analyze and interpret results.
    There's almost a hundred of GPU counters exposed by v3d in GALLIUM_HUD (supposedly available via GL extensions also). But what do they all mean?
    Some of them are probably the same as in the official VC4 Reference guide https://docs.broadcom.com/doc/12358545 but others seem to be completely new.
  • Any other VC6-relevant tricks, techniques, etc. Maybe related to Tile-Based Deferred Rendering (TBDR).
This all seems relatively figureout-able, but I wonder if anyone else has already done the work and documented any of this.

The context:
I'm contemplating making a fairly GPU rendering and memory bandwidth intensive application. Relatively intensive for a low-power SBC, i.e. I don't expect it to be anywhere near even an integrated GPU, and mostly memory, not compute/rendering, limited.

The app consists of rendering a trivial scene multiple times into a large framebuffer, and then postprocessing that buffer in a nonlocal fashion.

Preliminary tests show that the rendering part is mostly fine -- it's not 60fps, but enough for the purposes.
But the framebuffer postprocessing part is a 5fps disaster. Even just reading an empty (glClear, no rendering at all) framebuffer is already too slow.

Note that I'm already doing exclusive libdrm+KMS thing, i.e. there's no usual x11/wayland compositing tax.

I wonder if there are known ways to make progress from here. E.g. any specific GL extensions, tile-local processing, or maybe just switching to Vulkan to have more control over passes/subpasses, image layout transitions, etc.

Statistics: Posted by provod — Wed Sep 18, 2024 6:30 pm — Replies 2 — Views 58



Viewing all articles
Browse latest Browse all 4487

Trending Articles