Skip to content

Add WriteOnly type to replace giving &mut [u8] access to mapped buffers.#9042

Merged
cwfitzgerald merged 2 commits intogfx-rs:trunkfrom
kpreid:writeonly
Mar 11, 2026
Merged

Add WriteOnly type to replace giving &mut [u8] access to mapped buffers.#9042
cwfitzgerald merged 2 commits intogfx-rs:trunkfrom
kpreid:writeonly

Conversation

@kpreid
Copy link
Collaborator

@kpreid kpreid commented Feb 12, 2026

Connections
Partially addresses #8897.

Description
When accessing a buffer for writing, do not expose (or internally create) any &'a mut [u8] references to that memory, so as to avoid exposing write-combining memory that might visibly misbehave in atomic operations. Instead, expose a new pointer type WriteOnly<'a, [u8]> which offers only write operations. This can be sliced and reborrowed, allowing it to be used in most ways which &mut [u8] would be, hopefully with near-zero overhead.

This pointer type might also allow users to opt in to uninitialized memory for performance, while confining the consequences to GPU UB rather than Rust UB. However, that is a future possibility and is not implemented in this PR other than as a consideration in defining the API of the WriteOnly type.

There are already published libraries that implement a write-only pointer type, such as wref, outref; however, they are lacking in operations like chunking and IntoIterator which I believe are necessary to provide reasonable ergonomics for our users’ use cases. An alternative to this PR’s unsafe code would be to contribute changes to one of those libraries to meet our needs.

This PR does not completely fix #8897 because we still need to handle the case of calling get_mapped_range_mut() and writing, followed by calling get_mapped_range() and reading. Also, we should consider renaming *_mut() methods to *_write() or something — but that is polish not necessary in this PR.

Testing
This PR has near-complete test coverage for the WriteOnly type through a mix of doctests and #[test]s. These tests are run under Miri in a new CI job.

Squash or Rebase?
Rebase

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

@kpreid kpreid marked this pull request as draft February 12, 2026 00:32
@inner-daemons
Copy link
Collaborator

Ok I'm going off the generated docs so these won't be comments on the code but I have some questions. Overall trust your opinion more than mine.

  • Would this be worth its own crate or something?
  • WriteOnly::write consumes self? There doesn't seem to be a good enough reason for this. Easy to imagine a situation where someone wants to write the entire thing and then overwrite parts of it
  • Missing clone_from_slice? - not relevant for [u8]
  • Looks like it is missing the Index<Range<..>> stuff for more ergonomic slicing

@kpreid
Copy link
Collaborator Author

kpreid commented Feb 12, 2026

Would this be worth its own crate or something?

As I mentioned in the PR description, there are a couple already. I think that if we put effort into separate versioning it would make sense to see if we can contribute to one of those efforts instead of publishing yet another variation — and if not, we get to tweak it to be maximally ergonomic for writing into wgpu byte buffers without worrying about long-term API stability.

WriteOnly::write consumes self? There doesn't seem to be a good enough reason for this. Easy to imagine a situation where someone wants to write the entire thing and then overwrite parts of it

I’ve never seen such a situation, and if there was one, I’d expect it to be acting on a slice rather than a Sized value — write() only applies to Sized values. If this does come up, you just write .slice(..) to reborrow.

Missing clone_from_slice? - not relevant for [u8]

Not just [u8] — currently there are no methods for writing anything that is not Copy, so that it’s impossible to accidentally use the API to forget a value by overwriting it.

Looks like it is missing the Index<Range<..>> stuff for more ergonomic slicing

Index and IndexMut cannot be implemented because they return & and &mut respectively, so they imply the permission to read.

Copy link
Contributor

@madsmtm madsmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re in wgpu vs. outref/wref, an option in between could be to make a new crate (possibly in a separate gfx-rs repo?) - that'd make it possible to tune the API exactly for wgpu's needs, while also making it easier to test with Miri.

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is insanely high quality stuff. I definitely think this is the right approach, have one nit. For miri, we could also enable just the tests from naga/wgpu/wgpu-core/wgpu-hal initially, as those don't require the GPU at all and don't take much time to run.

@inner-daemons inner-daemons self-requested a review February 18, 2026 18:12
@kpreid kpreid force-pushed the writeonly branch 2 times, most recently from 2f196ef to 448f31e Compare February 21, 2026 07:19
@kpreid kpreid marked this pull request as ready for review February 21, 2026 07:21
@kpreid
Copy link
Collaborator Author

kpreid commented Feb 21, 2026

I stayed up entirely too late and I may regret this, but there are tests now, and I think this is basically ready for review, though definitely not ready for merging.

@kpreid kpreid force-pushed the writeonly branch 3 times, most recently from 8334940 to 0c6e5dc Compare February 21, 2026 07:36
///
/// [mapped]: Buffer#mapping-buffers
#[track_caller]
pub fn get_mapped_range(&self) -> BufferView {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add (if one doesn’t already exist) a test checking that one cannot get a read-access BufferView to a write mapping. Or if that is required to be possible by the spec, then things still might be fine, but some of the documentation I added needs to be changed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per exploratory testing, and discussion in maintainers meeting:

  • Currently, you can get_mapped_range() (read access) on a MapMode::Write
  • I am told that writes to write-combining memory can effectively "show up randomly later" to reads
  • Therefore, it is unsound to expose &[u8] pointing to write-combining memory
    • (Unless there is some kind of fence / cache flush that could be used? But we don't need that anyway.)
  • Therefore, we must prohibit get_mapped_range() working on that memory.
  • The simplest API surface we can provide is to say that get_mapped_range() never works on a MapMode::Write mapping under any conditions.
    • This will also simplify the user-facing model because reading a write mapping never was guaranteed to give you meaningful data.

I will update this PR to include a commit that makes this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine if the wgpu API is more restrictive in this case. get_mapped_range_mut doesn't even exist in the spec. This validation would have to only happen in wgpu though not in wgpu-core.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolution from today’s maintainers meeting: This needs looking at, but can be done separately from this PR (which means this PR doesn’t fully fix #8897) so that we can get this API change moving.

It might be that we don’t need stricter validation and can just add a barrier, thus allowing reads as long as the caller transitions explicitly by a separate get_mapped_range(). (But does anyone actually need that?)

@cwfitzgerald cwfitzgerald self-assigned this Feb 23, 2026
@inner-daemons inner-daemons removed their request for review February 26, 2026 21:57
@human-0
Copy link

human-0 commented Feb 27, 2026

I don't think this completely fixes the issue. While the CPU can no longer read, writes can still reach the GPU out of order, so write 1 -> send to other thread -> write 2 can result in the second write being lost.

@kpreid
Copy link
Collaborator Author

kpreid commented Feb 27, 2026

While the CPU can no longer read, writes can still reach the GPU out of order,

Would it be sufficient to ensure that, for each byte of the mapped memory, at most one thread writes to that memory?

If so, then I think that can be addressed (unfortunately, further complicating the API) by having separate Send and !Send flavors, such that

  • the Send flavor only permits writing at most once or being split (consuming self and having no reborrowing), or being downgraded to !Send
  • the !Send flavor allows reborrowing and multiple writes (&mut self methods)
  • get_mapped_range_mut() can only be called once per non-overlapping range (i.e. dropping a BufferViewMut doesn't release the range to be written again)

(On the other hand, such a write mismatch is only a correctness issue and not a Rust UB issue, since the consequences do not reveal themselves to any Rust references, so we could tell the user “if you write the same memory more than once, do it from one thread” and leave it at that. But that’s pretty subtle to comprehend.)

@cwfitzgerald
Copy link
Member

I'm not worried about that case - submit will already validate that the buffer is unmapped (and all mapped ranges are dropped) and creates a happens before relationship between operations on the cpu -> reads on the gpu.

Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small nit and some merge conflicts, but then this is g2g, really amazing work here!

Comment on lines +1 to +2
// FIXME: Now that MAP_WRITE mappings are write-only,
// the “mut” and “immutable” terminology is incorrect.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to tackle this here? Otherwise should be an issue.

@kpreid
Copy link
Collaborator Author

kpreid commented Mar 11, 2026

Updated and rebased. Should be ready to merge. Changes:

Issues to file for followup work:

  • We still need to handle the case of calling get_mapped_range_mut() and writing, followed by calling get_mapped_range() and reading.
  • We should consider renaming *_mut() methods to *_write(), but that’s more of a semantic nitpick than something critical, and I think it’s unclear whether doing it at the same time is even helpful for migration.

kpreid added 2 commits March 11, 2026 11:27
`WriteOnly<'a, T>` is a custom smart pointer which is stricter than
`&'a mut [T]` in that it only allows writing data and not reading it.
This will be part of the fix for <gfx-rs#8897>.

This commit only introduces the public type and does not use it.
When accessing a buffer for writing, do not expose (or internally
create) any `&'a mut [u8]` references to that memory, so as to avoid
exposing write-combining memory that might visibly misbehave in atomic
operations. Instead, expose the new pointer type `WriteOnly<'a, [u8]>`
which offers only write operations. This can be sliced and reborrowed,
allowing it to be used in most ways which `&mut [u8]` would be,
hopefully with near-zero overhead.

This pointer type might also allow users to opt in to uninitialized
memory for performance, while confining the consequences to GPU UB
rather than Rust UB. However, that is a future possibility and is not
implemented in this commit other than as a consideration in defining the
API of the `WriteOnly` type.

This change is not a complete solution to the problem of write-combining
memory because it is still possible to read after writing by calling
`get_mapped_range()` after `get_mapped_range_mut()`. However, that is
solvable separately, perhaps with a barrier or a validation change,
and is not particularly coupled with this part of the solution.
@cwfitzgerald cwfitzgerald merged commit a97a49d into gfx-rs:trunk Mar 11, 2026
59 checks passed
@kpreid kpreid deleted the writeonly branch March 11, 2026 22:02
@kpreid
Copy link
Collaborator Author

kpreid commented Mar 11, 2026

Issues to file for followup work:

  • We still need to handle the case of calling get_mapped_range_mut() and writing, followed by calling get_mapped_range() and reading.

Reopened #8897 and commented for this.

  • We should consider renaming *_mut() methods to *_write(), but that’s more of a semantic nitpick than something critical, and I think it’s unclear whether doing it at the same time is even helpful for migration.

Filed #9202 for this.

github-merge-queue bot pushed a commit to bevyengine/bevy that referenced this pull request Mar 24, 2026
wgpu update for v29.

I have tested on macos m1, m5, and windows. Linux testing is
appreciated.

- [x] before merge, naga_oil and dlss_wgpu need to be published, and the
patches referencing their respective PRs removed from the workspace
Cargo.toml

##### other PRs

- naga_oil: bevyengine/naga_oil#134
- dlss_wgpu: bevyengine/dlss_wgpu#27

##### Source of relevant changes

- `Dx12Compiler::DynamicDxc` no longer has `max_shader_model`
    - gfx-rs/wgpu#8607
- `Dx12BackendOptions::force_shader_model` comes from:
    - gfx-rs/wgpu#8984
- Allow optional `RawDisplayHandle` in `InstanceDescriptor`
    - gfx-rs/wgpu#8012
- Add `GlDebugFns` option to disable OpenGL debug functions
    - gfx-rs/wgpu#8931
- Add a DX12 backend option to force a certain shader model
    - gfx-rs/wgpu#8984
- Migrate validation from maxInterStageShaderComponents to
maxInterStageShaderVariables
    - gfx-rs/wgpu#8652
- gaps are now supported in bind group layouts
    - gfx-rs/wgpu#9034
- depth validation changed to option to match spec
    - gfx-rs/wgpu#8840
- SHADER_PRIMITIVE_INDEX is now PRIMITIVE_INDEX
  - gfx-rs/wgpu#9101
- Support for binding arrays of RT acceleration structures
  - gfx-rs/wgpu#8923
- Make HasDisplayHandle optional in WindowHandle
  - gfx-rs/wgpu#8782
- `QueueWriteBufferView` can no longer be dereferenced to `&mut [u8]`,
so use `WriteOnly`.
  - gfx-rs/wgpu#9042
- ~bevy_mesh currently has an added dependency on `wgpu`, can we move
`WriteOnly` to wgpu-types?~ (it is in wgpu-types now)
- Change max_*_buffer_binding_size type to match WebGPU spec (u32 ->
u64)
  - gfx-rs/wgpu#9146
- raw vulkan init `open_with_callback` takes Limits as argument now
  - gfx-rs/wgpu#8756

## Known Issues

There is currently one known issue with occlusion culling on macos,
which we've decided to disable on macos by checking the limits we
actually require. This makes it so that if wgpu releases a patch fix,
bevy 0.19 users will benefit from occlusion culling re-enabling for
them.

<details><summary>More details</summary>

On macos, the wpgu limits were changed to align with the spec and now
put the early and late GPU occlusion culling `StorageBuffer` limit at 8,
but we currently use 9. [Filed in wgpu
repo](gfx-rs/wgpu#9287)

```
2026-03-19T01:37:10.771117Z ERROR bevy_render::error_handler: Caught rendering error: Validation Error

Caused by:
  In Device::create_bind_group_layout, label = 'build mesh uniforms GPU late occlusion culling bind group layout'
    Too many bindings of type StorageBuffers in Stage ShaderStages(COMPUTE), limit is 8, count was 9. Check the limit `max_storage_buffers_per_shader_stage` passed to `Adapter::request_device`
```

</details>

solari working on wgpu 29:

<img width="1282" height="752" alt="image"
src="https://github.com/user-attachments/assets/4744faec-65c0-4a72-93e1-34a721fc26d8"
/>

---------

Co-authored-by: atlv <email@atlasdostal.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BufferViewMut is unsound due to WC memory

6 participants