fix(futex): 修复robust futex实现中的竞态条件和逻辑错误#1372
fix(futex): 修复robust futex实现中的竞态条件和逻辑错误#1372fslongjin merged 1 commit intoDragonOS-Community:masterfrom
Conversation
fslongjin
commented
Nov 13, 2025
- 修复在get_robust_list中正确处理未设置robust list的情况
- 在exit_robust_list中重新从用户空间读取最新robust list内容
- 使用原子操作替换原有的非原子读写操作
- 修复FutexIterator中的链表遍历逻辑
There was a problem hiding this comment.
Pull Request Overview
This PR fixes critical race conditions and logic errors in the robust futex implementation to improve thread safety and correctness.
- Replaces non-atomic read-write operations with true atomic CAS operations in
handle_futex_death - Fixes
get_robust_listto return success with NULL head when no robust list is set, aligning with Linux behavior - Re-reads robust list head from user space in
exit_robust_listto capture any user-space modifications - Corrects FutexIterator traversal logic to properly detect empty lists and sentinel nodes
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| user/apps/tests/syscall/gvisor/blocklists/futex_test | Removes blocklists for RobustFutexTest and PrivateFutexTest tests, indicating these tests should now pass with the fixes |
| kernel/src/libs/futex/syscall/sys_robust_futex.rs | Refactors return statement with intermediate variable (no functional change) |
| kernel/src/libs/futex/futex.rs | Core changes: adds atomic operations support, fixes get_robust_list to handle unset list case, re-reads robust list head in exit_robust_list, replaces manual read-CAS-write with atomic compare_exchange, and improves FutexIterator logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| while !self.is_sentinel() { | ||
| if self.count >= ROBUST_LIST_LIMIT { |
There was a problem hiding this comment.
The check self.count == ROBUST_LIST_LIMIT has been changed to self.count >= ROBUST_LIST_LIMIT. This means that when count reaches exactly ROBUST_LIST_LIMIT, the loop will break, but count will still be ROBUST_LIST_LIMIT (not -1).
On the next call to next(), the check if self.is_end() (which checks self.count < 0) will fail, and the code will continue processing even though the limit was reached. This could lead to processing entries beyond the limit.
Consider setting self.count = -1 before breaking when the limit is reached.
| if self.count >= ROBUST_LIST_LIMIT { | |
| if self.count >= ROBUST_LIST_LIMIT { | |
| self.count = -1; |
| let result = crate::libs::futex::futex::RobustListHead::set_robust_list(head, len); | ||
|
|
||
| result |
There was a problem hiding this comment.
[nitpick] This refactoring is unnecessary and reduces code clarity. The original return statement was more concise and idiomatic in Rust. The intermediate result variable adds no value since it's immediately returned without any additional processing or logging.
| let result = crate::libs::futex::futex::RobustListHead::set_robust_list(head, len); | |
| result | |
| crate::libs::futex::futex::RobustListHead::set_robust_list(head, len) |
| if mval & FUTEX_WAITERS != 0 { | ||
| let mut flags = FutexFlag::FLAGS_MATCH_NONE; | ||
| flags.insert(FutexFlag::FLAGS_SHARED); | ||
| // 唤醒操作可能会失败,但不影响流程 | ||
| let _ = Futex::futex_wake(futex_uaddr, flags, 1, FUTEX_BITSET_MATCH_ANY); | ||
| } |
There was a problem hiding this comment.
After a successful CAS operation, the code checks if mval & FUTEX_WAITERS != 0 to determine whether to wake waiters. However, if the original uval did not have the FUTEX_WAITERS bit set, then mval won't have it either (since mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED).
This means the wake operation will only happen if the dying owner's futex had waiters marked at the time of death. While this may be correct behavior, consider if this handles the case where waiters are added between reading uval and the CAS succeeding.
In Linux's implementation, after setting FUTEX_OWNER_DIED, it performs a wake if there might be waiters to handle this race. Consider checking the updated value or always attempting a wake (which is cheap if no waiters exist).
| if mval & FUTEX_WAITERS != 0 { | |
| let mut flags = FutexFlag::FLAGS_MATCH_NONE; | |
| flags.insert(FutexFlag::FLAGS_SHARED); | |
| // 唤醒操作可能会失败,但不影响流程 | |
| let _ = Futex::futex_wake(futex_uaddr, flags, 1, FUTEX_BITSET_MATCH_ANY); | |
| } | |
| let mut flags = FutexFlag::FLAGS_MATCH_NONE; | |
| flags.insert(FutexFlag::FLAGS_SHARED); | |
| // 唤醒操作可能会失败,但不影响流程 | |
| let _ = Futex::futex_wake(futex_uaddr, flags, 1, FUTEX_BITSET_MATCH_ANY); |
| /// 检查是否到达链表末尾(entry 指回 head.list) | ||
| fn is_sentinel(&self) -> bool { | ||
| // 链表的哨兵是 &head.list,其地址就是 head.uaddr | ||
| // 因为 list 是 head 结构的第一个字段 |
There was a problem hiding this comment.
[nitpick] The comment states "链表的哨兵是 &head.list,其地址就是 head.uaddr,因为 list 是 head 结构的第一个字段" (The sentinel of the linked list is &head.list, its address is head.uaddr, because list is the first field of the head structure).
This assumption is only valid in Rust with #[repr(C)] and when list is indeed the first field. Looking at PosixRobustListHead definition, it has #[repr(C)] and list: PosixRobustList is the first field, so this is correct. However, the comment could be clearer about the dependency on this representation.
| // 因为 list 是 head 结构的第一个字段 | |
| // 这个假设依赖于 head 结构体使用 #[repr(C)] 并且 list 是第一个字段 |
- 修复在get_robust_list中正确处理未设置robust list的情况 - 在exit_robust_list中重新从用户空间读取最新robust list内容 - 使用原子操作替换原有的非原子读写操作 - 修复FutexIterator中的链表遍历逻辑 Signed-off-by: longjin <longjin@DragonOS.org>
bd46d3f to
0c4a584
Compare