fix(vfs/sys_write): 在进入写路径前先将用户缓冲区拷贝到内核 by xboHodx · Pull Request #1821 · DragonOS-Community/DragonOS

xboHodx · 2026-03-19T13:03:08Z

这个 PR 修复了 DragonOS 运行 kata-agent 时触发的一处内核 panic。

问题分析

sys_write(2) 之前会把“直接借用的用户态缓冲区切片”传入 VFS 写路径。对于 pipe fd，pipe::write_at()
会先获取自旋锁，再去读取这段用户缓冲区。如果此时源用户页还没有 fault in，内核就会在持有 pipe 自旋锁的情况下
进入 filemap_fault，最终带着 preempt_count=1 进入阻塞等待路径，并在 schedule() 处触发断言 panic。

这个 PR 将 sys_write 的用户态写入路径改为：先通过受保护的用户缓冲区接口把数据拷贝到内核 Vec，然后再调用
do_write()。这样 VFS/pipe 等后续路径只会访问内核内存，不会再在持锁状态下触碰用户页。

修复前的路径大致是：

sys_write 创建 UserBufferReader
调用 read_from_user() 得到一个基于用户地址的借用切片
将该切片直接传给 do_write()
pipe::write_at() 持有 self.inner 自旋锁后，通过 copy_from_slice() 消费该切片

这意味着：如果用户缓冲区对应页面还未建立映射或需要缺页处理，就可能在持有 spinlock 的情况下触发页错误，这是
不允许的。

修复方式

在 kernel/src/filesystem/vfs/syscall/sys_write.rs 中：

保留原有的 user_accessible_len() 检查，维持当前“部分可访问时按短写处理”的语义
对于来自用户态的写请求，改为使用 buffer_protected(0)?
通过 read_all()? 先把可访问数据安全拷贝到内核 Vec
再将内核缓冲区传给 do_write()

对于内核态发起的写请求，仍然保留原有直接读取路径。

修复效果

执行 /bin/kata-agent 不再触发之前的 schedule() panic
原先由 preempt_count=1 引发的内核崩溃已经消失
kata-agent 已经能够正常进入用户态执行，并继续运行到后续兼容性问题
当前新的报错是 /proc/self/mountstats 缺失，这与本次 panic 无关

这说明本 PR 已经从根本上修复了这次内核 bug。
：不再 panic，kata-agent 能输出自身日志并继续执行到新的用户态错误

更加合适的处理

现在的处理会多一次拷贝，影响性能。
参考linux中，将用户态引用包装成一个带类型信息的 iov_iter，明确告诉后续路径：

这段数据是 user-backed
不能按普通内核缓冲区随便解引用
必须通过 copy_from_iter()、copy_page_from_iter()、iov_iter_get_pages*() 这类受控接口消费

若是要这样处理的话得重构 VFS 写接口，让写路径能区分 user buffer 和 kernel buffer，并且用类似 iov_iter / UserBuffer 的抽象把这个语义一路保留下去

…ite syscall Signed-off-by: xboHodx <xbohodx2024@gmail.com>

fslongjin · 2026-03-19T14:07:01Z

@claude review

dragonos-community-ai-pr-reviewer · 2026-03-19T14:07:20Z

Claude finished @fslongjin's task in 5m 31s —— View job

Code Review for PR #1821: fix(vfs/sys_write): 在进入写路径前先将用户缓冲区拷贝到内核

I'll review the changes in this PR that fixes a kernel panic when running kata-agent. Let me analyze the code changes and provide feedback.

Todo list:

Read the changed file to understand the implementation
Analyze the fix for correctness and potential issues
Check for edge cases and performance considerations
Review against the PR description's analysis
Provide comprehensive review feedback

Summary

The PR successfully fixes a kernel panic that occurred when sys_write to a pipe file descriptor triggered a page fault while holding a spinlock. The fix ensures user buffers are copied to kernel memory before entering write paths that may hold locks.

Code Review

Changes Overview

The modifications in kernel/src/filesystem/vfs/syscall/sys_write.rs replace the previous pattern of passing a direct reference to user memory with a safe copy-to-kernel approach:

Before: UserBufferReader::read_from_user(0)? returned a slice referencing user memory, which was passed to do_write(). Downstream code (e.g., pipe::write_at) could page-fault while holding locks.

After: For user-space calls, buffer_protected(0)?.read_all() copies accessible user data into a kernel Vec<u8> before calling do_write(). Kernel-space calls continue using read_from_user(0)?.to_vec().

Correctness Assessment

Safety: The fix correctly prevents page faults while holding locks by ensuring all subsequent write paths only access kernel memory.
Error Handling: Maintains existing user_accessible_len checks for partial accessibility (short write semantics). The copy_from_user_protected mechanism safely handles page faults via exception tables.
Edge Cases:
- Zero-length writes (len == 0) handled correctly
- Inaccessible buffers (accessible == 0) return EFAULT
- Partial accessibility results in short writes (consistent with Linux behavior)
- Kernel-space writes continue to work (with minor extra copying)
Race Conditions: The user_accessible_len + copy_from_user_protected combination properly handles concurrent memory mapping changes (faults return EFAULT).

Potential Issues

Memory Allocation: read_all() allocates a Vec<u8> of size accessible (up to the requested write length). This could cause kernel OOM for large writes that previously only referenced user memory. Consider:
- Are there existing limits on write sizes? (pipe buffers, socket buffers, etc.)
- Should we implement chunked copying or limit the maximum allocation size?
Performance Impact: As noted in the PR description, this adds an extra copy. While acceptable for fixing the panic, a longer-term solution (类似 Linux 的 iov_iter 抽象) would be beneficial.
Consistency: Other syscalls (sys_read, sys_writev, pwrite, etc.) may have similar patterns and could benefit from the same fix. Consider auditing related code.

Recommendations

Merge this fix as it resolves a critical kernel panic.
Add a comment explaining why the copy is necessary (to avoid page faults while holding locks) and referencing the kata-agent panic scenario.
Consider future refactoring toward a UserBuffer/iov_iter-like abstraction that preserves the user-backed semantics through the write path without unnecessary copies.
Test Coverage: Ensure existing tests pass and consider adding a test for the pipe+page-fault scenario if not already covered.

Verification

The fix aligns with the PR's analysis and should eliminate the schedule() panic observed with kata-agent. The additional copy is a reasonable trade-off for correctness.

Approval: ✅ The PR can be merged as-is. Future improvements (performance, abstraction) can be addressed separately.

--- |

fix(sys_write): improve user buffer handling and error checking in wr…

bfe78b0

…ite syscall Signed-off-by: xboHodx <xbohodx2024@gmail.com>

github-actions bot added the Bug fix A bug is fixed in this pull request label Mar 19, 2026

fslongjin merged commit 7376eae into DragonOS-Community:master Mar 19, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(vfs/sys_write): 在进入写路径前先将用户缓冲区拷贝到内核#1821

fix(vfs/sys_write): 在进入写路径前先将用户缓冲区拷贝到内核#1821
fslongjin merged 1 commit intoDragonOS-Community:masterfrom
xboHodx:fix/sys_write

xboHodx commented Mar 19, 2026

Uh oh!

fslongjin commented Mar 19, 2026

Uh oh!

dragonos-community-ai-pr-reviewer bot commented Mar 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xboHodx commented Mar 19, 2026

问题分析

修复前的路径大致是：

修复方式

修复效果

更加合适的处理

Uh oh!

fslongjin commented Mar 19, 2026

Uh oh!

dragonos-community-ai-pr-reviewer bot commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review for PR #1821: fix(vfs/sys_write): 在进入写路径前先将用户缓冲区拷贝到内核

Summary

Code Review

Changes Overview

Correctness Assessment

Potential Issues

Recommendations

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dragonos-community-ai-pr-reviewer bot commented Mar 19, 2026 •

edited

Loading