Skip to content

WebGPU Support#713

Draft
gkjohnson wants to merge 302 commits intomainfrom
webgpu-pathtracer
Draft

WebGPU Support#713
gkjohnson wants to merge 302 commits intomainfrom
webgpu-pathtracer

Conversation

@gkjohnson
Copy link
Copy Markdown
Owner

@gkjohnson gkjohnson commented Feb 4, 2026

cc @TheBlek

I've branched from #705 and changed things around quite a bit to address the storage buffer limitations by using storage textures and organized kernels into dedicated classes. So canvas resize etc all works. I have also separated the "MegaKernal" from the "PathTracerCore" so it's easier it will be easier to follow the differences and dependencies between the implementations.

Next I'm going to look into some of the ideas around a ray queue we'd discussed previously. Then we can try some timing to see how things pan out.

image

Relatedly, this write up will be interesting for a wave front path tracer:

https://developer.blender.org/docs/features/cycles/kernel_scheduling/

Plans
  • *Add async generation
  • *Adjust queue sizes based on needs, allow more control over exact number of rays handled per frame
  • *smarter background, env caching
  • *Adjust wavefront tile-based ray accumulation - just track a head pointer to a pixel and march forward?
  • *Test across browsers, mobile devices
    • Use custom float texture interpolation
  • Optimize render to screen material
  • Issue announcing deprecation and future removal of WebGL version, close WebGL-related issues
  • WebGLPathTracer parity
    • Add PBR materials
    • Next event estimation
    • Fog (subsurface scattering?)
    • Perform opacity testing DURING BVH traversal
  • Cleanup
  • Move compute data class to three-mesh-bvh
  • Use global variable declarations for wgslTagFn variables (use .global member)
  • Consider "getShapecastFn" API - allow for more flexibility in function APIs? Remove requirement for return struct?
    • make function signatures consistent (pass a pointer to "transformStruct")
  • Test and support negative scaling especially for face culling
Future
  • Inspect size limits for geometry (non indexed crab)
    • Convert geometry definitions to textures
    • Allow for expanding storage sizes (needs to be specified at construction time)
  • Add "debug" views (sample count, completion visualization, etc)
  • Reuse uniforms, data across backend swaps
  • Add support for partially-updating / fast-updating bvh data
    • just transforms
    • just visibility
    • rewrite / refit bvhs
    • just materials
    • per-mesh, material, etc?
    • adjust ObjectBVH to sort objects based on uuid before construction for reliable ordering
  • wgslTagFn, wgslFn, etc for general usage: add support for an on-the-fly transpiled code node so nodes can be used in both backends.
  • consider adding "normalMatrix" to transform struct to avoid recalculation
  • provide "minimal" structs to reduce fetching overhead from storage buffers (eg a struct with only position for bvh traversal). The same buffer can be bound with different struct layouts.
  • concurrent handling of rays for the same pixel
  • Add variance detection
  • Add "completion" detection
  • Bidirectional path tracing
  • Custom material BSDF, definitions
  • WebGPU Denoiser
    • Render normal, albedo to separate array texture layers
  • Improved texture usage

@gkjohnson gkjohnson marked this pull request as draft February 4, 2026 13:13
@TheBlek TheBlek mentioned this pull request Feb 5, 2026
5 tasks
@gkjohnson
Copy link
Copy Markdown
Owner Author

@TheBlek - I'm going to call this "done" as a first pass, for now. There are some workarounds of three.js issues which are marked in TODOs but it's working fairly well. One of the features I'm liking the most about it is how scalable it is - we can reduce the amount of rays processed per frame based on framerate and the page can remain responsive since the whole 7+ bounce path doesn't need to finish in a single pass. Curious to hear your thoughts.

The overall approach works like so:

  1. Iterate over all pixels in a tiled format and push rays to trace onto a ring buffer work queue. We only iterate over the tile if there's enough space in the queue to add rays for all pixels in the tile (even though in practice we may be skipping some). Rays that have been added to the queue have their pixels marked as "active" to avoid multiple rays for the same pixel to the queue. We also issue a compute call for every tile but use indirect dispatch buffers to "cancel" unneeded generation when the queue has become full.

  2. Trace rays in the work queue against the BVH. If there is no hit then accumulate the color in the final target buffer, increment the sample count, and mark the pixel as "inactive". If it does hit then add it to the "hitQueue". Then increment the ray queue ring buffer head pointer forward .

  3. Process the hits. If we have reached the maximum bounce count then terminate the ray, mark the pixel as inactive, and increment the sample. Otherwise add a scatter ray back to the ray queue. Then go back to step 1 to "top up" the queue with any inactive pixels and start again.

--

A few things that need to be considered or added to aid with performance at some point:

  • Add support for a maximum sample count to prevent adding and working on rays for pixels that will have "finished" more quickly.

  • We'll want some method for detecting that at a minimum X samples across the image have finished so that we can determine when it's ready to show and avoid the partially-finished rendering. Probably with a simple compute buffer that checks all pixels and writes a storage buffer we can read back if a pixel has not passed the threshold.

  • Adding some kind of "convergence detection" using a minimum sample count and tracking variance of the samples. This will let pixels be marked as "completed" early on if it converges early (diffuse surfaces, unlit surfaces, background, etc) so we can skip rays for these cases and focus on pixels that need more rays and samples to converge.

  • Related to the above point: we'll eventually get to a point where we only have a few hundred pixels or less left to process at which point it would be best to dispatch multiple rays per pixel and we'll need to handle the race condition of rays writing to the same pixel. This will probably involve adding a special kernel that can help resolve multiple rays writing to the same pixel.

--

I'll wait to see where you're going before putting too much more work into this path tracing logic specifically. I may look at some of the other points I mentioned in #705 (comment) when I have time.

@gkjohnson
Copy link
Copy Markdown
Owner Author

Here's another article from AMD on GPU-based path tracing that might have some good insight, as well:

https://gpuopen.com/download/2025_RT_TechReport.pdf

@gkjohnson
Copy link
Copy Markdown
Owner Author

An open BSDF implementation from Adobe - maybe good for reference: https://github.com/adobe/openpbr-bsdf

It only implements unidirectional pathtracing, though - not bidirectional. I'm not sure how complicated it is to derive the sibling PDF needed for bidirectional.

TheBlek and others added 5 commits March 10, 2026 12:32
WebGPUPathTracer: Calculate detailed sample counts
WebGPUPathTracer: Improve build performance, fix failures when rendering an empty scene
WebGPUPathTracer: Get megakernel working in Firefox, Safari
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants