Skip to content

Avoid unnecessary version_get_uniform() calls#105864

Merged
Repiteo merged 1 commit intogodotengine:masterfrom
marcosc90:perf-version_get_uniform
Apr 28, 2025
Merged

Avoid unnecessary version_get_uniform() calls#105864
Repiteo merged 1 commit intogodotengine:masterfrom
marcosc90:perf-version_get_uniform

Conversation

@marcosc90
Copy link
Copy Markdown
Contributor

@marcosc90 marcosc90 commented Apr 28, 2025

This PR avoids redundant version_get_uniform() calls in GLES3 uniform updates.

Profiled using hotspot with

production=yes debug_symbols=yes

Before
image

After
Screenshot from 2025-04-28 14-58-46

CPU core/cycles inclusive (2.24% -> 1.37%)

Note: the tests/*.out changes were done by commit hook:

validate-builders........................................................Failed
- hook id: validate-builders
- exit code: 3
- files were modified by this hook

@marcosc90 marcosc90 requested review from a team as code owners April 28, 2025 13:24
@AThousandShips AThousandShips added this to the 4.x milestone Apr 28, 2025
@AThousandShips AThousandShips requested a review from a team April 28, 2025 13:26
@clayjohn
Copy link
Copy Markdown
Member

The before and after images are the same image. Do you mind uploading the after image again?

@marcosc90
Copy link
Copy Markdown
Contributor Author

marcosc90 commented Apr 28, 2025

The before and after images are the same image. Do you mind uploading the after image again?

My bad, edited with correct screenshot

Copy link
Copy Markdown
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice change. I'm happy to see someone profiling our OpenGL API usage, I'm sure there is a lot of low hanging fruit (especially for stuff like this which is rarely touched).

Copy link
Copy Markdown
Contributor

@Repiteo Repiteo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oop, looks like the detection hooks for the builders need to be expanded to the test files themselves; will fix that later today. That's not at all a blocker for this PR though, it's good to go!

@marcosc90
Copy link
Copy Markdown
Contributor Author

Very nice change. I'm happy to see someone profiling our OpenGL API usage, I'm sure there is a lot of low hanging fruit (especially for stuff like this which is rarely touched).

Thanks! Yes, I’m focusing on WebGL optimizations since it’s the main bottleneck on web. Anything that’s not too specific to my game I'll send a PR.

@clayjohn
Copy link
Copy Markdown
Member

Very nice change. I'm happy to see someone profiling our OpenGL API usage, I'm sure there is a lot of low hanging fruit (especially for stuff like this which is rarely touched).

Thanks! Yes, I’m focusing on WebGL optimizations since it’s the main bottleneck on web. Anything that’s not too specific to my game I'll send a PR.

One thing we have discussed in the past but haven't tackled specifically is the cost of compiling internal shaders. @jasonwinterpixel noticed a huge amount of wasted time initializing and compiling shaders that his game never used. For example, we initialize shaders related to the 3D engine even in 2D games. Have you noticed shader compilation being a bottleneck for load times in your project?

@marcosc90
Copy link
Copy Markdown
Contributor Author

Yes, I’ve heavily optimized shader compilation in my project, though the code still needs cleanup and isn’t engine-ready, since it targets only the latest Chrome versions without fallbacks.

@jordo
Copy link
Copy Markdown
Contributor

jordo commented Apr 28, 2025

@clayjohn I believe that was myself when I was looking at startup time on the web... Most of the init time was due to compiling internal shaders and related variants. And yes, almost all of them were unsed by our (2D) game.

@jordo
Copy link
Copy Markdown
Contributor

jordo commented Apr 28, 2025

Another big potential improvement (at least for the web / GL backend) is to check for async/parallel compile support, and use KHR_parallel_shader_compile if available.

It was implemented in three.js here:

mrdoob/three.js#19752

I did spend some time looking into this, at least attempting to defer the check compile/link status in the initial group of shaders until the next frame, but didn't get to a point where it worked... But this also seemed like a really good candidate to look into wrt engine startup duration on web.

@clayjohn
Copy link
Copy Markdown
Member

@clayjohn I believe that was myself when I was looking at startup time on the web... Most of the init time was due to compiling internal shaders and related variants. And yes, almost all of them were unsed by our (2D) game.

Oops! Sorry for the mixup

@jordo
Copy link
Copy Markdown
Contributor

jordo commented Apr 28, 2025

sorry, correct link: mrdoob/three.js#19752

@marcosc90
Copy link
Copy Markdown
Contributor Author

Another big potential improvement (at least for the web / GL backend) is to check for async/parallel compile support, and use KHR_parallel_shader_compile if available.

It was implemented in three.js here:

mrdoob/three.js#19752

I did spend some time looking into this, at least attempting to defer the check compile/link status in the initial group of shaders until the next frame, but didn't get to a point where it worked... But this also seemed like a really good candidate to look into wrt engine startup duration on web.

I already have KHR_parallel_shader_compile implemented, and it works great for preloading. I was able to get rid of [Violation] 'requestAnimationFrame' handler took XXms bringing it down from ~700ms to zero. But for proper async compilation, I'd need a bigger refactor. Right now, on-demand compilation through

_FORCE_INLINE_ bool _version_bind_shader(RID p_version, int p_variant, uint64_t p_specialization) {
doesn't work correctly, since callers of _version_bind_shader either crash or don't work correctly when it returns false, So, changes would likely be needed in some render methods, but I’m still figuring out exactly what.

For example, in this part:

if (prev_shader != shader || prev_variant != instance_variant || spec_constants != prev_spec_constants) {
bool success = material_storage->shaders.scene_shader.version_bind_shader(shader->version, instance_variant, spec_constants);
if (!success) {
break;
}

What would happen if we just skip rendering that shader for the current frame (since it’ll probably be ready by the next one), but continue with the rest of the render? (Still need to understand more of the codebase before I can make that change, only been working with Godot for about a month)

@jordo
Copy link
Copy Markdown
Contributor

jordo commented Apr 28, 2025

doesn't work correctly, since callers of _version_bind_shader either crash or don't work correctly when it returns false, So, changes would likely be needed in some render methods

Ya, exactly where I kinda got to. Larger refactor seemed necessary.

@Repiteo Repiteo merged commit d04e811 into godotengine:master Apr 28, 2025
20 checks passed
@Repiteo
Copy link
Copy Markdown
Contributor

Repiteo commented Apr 28, 2025

Thanks! Congratulations on your first merged contribution! 🎉

@marcosc90 marcosc90 deleted the perf-version_get_uniform branch April 28, 2025 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants