Skip to content

fix(futex): 修复robust futex实现中的竞态条件和逻辑错误#1372

Merged
fslongjin merged 1 commit intoDragonOS-Community:masterfrom
fslongjin:fix-robust-futex-handling
Nov 13, 2025
Merged

fix(futex): 修复robust futex实现中的竞态条件和逻辑错误#1372
fslongjin merged 1 commit intoDragonOS-Community:masterfrom
fslongjin:fix-robust-futex-handling

Conversation

@fslongjin
Copy link
Member

  • 修复在get_robust_list中正确处理未设置robust list的情况
  • 在exit_robust_list中重新从用户空间读取最新robust list内容
  • 使用原子操作替换原有的非原子读写操作
  • 修复FutexIterator中的链表遍历逻辑

Copilot AI review requested due to automatic review settings November 13, 2025 07:12
@github-actions github-actions bot added the Bug fix A bug is fixed in this pull request label Nov 13, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes critical race conditions and logic errors in the robust futex implementation to improve thread safety and correctness.

  • Replaces non-atomic read-write operations with true atomic CAS operations in handle_futex_death
  • Fixes get_robust_list to return success with NULL head when no robust list is set, aligning with Linux behavior
  • Re-reads robust list head from user space in exit_robust_list to capture any user-space modifications
  • Corrects FutexIterator traversal logic to properly detect empty lists and sentinel nodes

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
user/apps/tests/syscall/gvisor/blocklists/futex_test Removes blocklists for RobustFutexTest and PrivateFutexTest tests, indicating these tests should now pass with the fixes
kernel/src/libs/futex/syscall/sys_robust_futex.rs Refactors return statement with intermediate variable (no functional change)
kernel/src/libs/futex/futex.rs Core changes: adds atomic operations support, fixes get_robust_list to handle unset list case, re-reads robust list head in exit_robust_list, replaces manual read-CAS-write with atomic compare_exchange, and improves FutexIterator logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

}

while !self.is_sentinel() {
if self.count >= ROBUST_LIST_LIMIT {
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check self.count == ROBUST_LIST_LIMIT has been changed to self.count >= ROBUST_LIST_LIMIT. This means that when count reaches exactly ROBUST_LIST_LIMIT, the loop will break, but count will still be ROBUST_LIST_LIMIT (not -1).

On the next call to next(), the check if self.is_end() (which checks self.count < 0) will fail, and the code will continue processing even though the limit was reached. This could lead to processing entries beyond the limit.

Consider setting self.count = -1 before breaking when the limit is reached.

Suggested change
if self.count >= ROBUST_LIST_LIMIT {
if self.count >= ROBUST_LIST_LIMIT {
self.count = -1;

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +46
let result = crate::libs::futex::futex::RobustListHead::set_robust_list(head, len);

result
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] This refactoring is unnecessary and reduces code clarity. The original return statement was more concise and idiomatic in Rust. The intermediate result variable adds no value since it's immediately returned without any additional processing or logging.

Suggested change
let result = crate::libs::futex::futex::RobustListHead::set_robust_list(head, len);
result
crate::libs::futex::futex::RobustListHead::set_robust_list(head, len)

Copilot uses AI. Check for mistakes.
Comment on lines +955 to +960
if mval & FUTEX_WAITERS != 0 {
let mut flags = FutexFlag::FLAGS_MATCH_NONE;
flags.insert(FutexFlag::FLAGS_SHARED);
// 唤醒操作可能会失败,但不影响流程
let _ = Futex::futex_wake(futex_uaddr, flags, 1, FUTEX_BITSET_MATCH_ANY);
}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a successful CAS operation, the code checks if mval & FUTEX_WAITERS != 0 to determine whether to wake waiters. However, if the original uval did not have the FUTEX_WAITERS bit set, then mval won't have it either (since mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED).

This means the wake operation will only happen if the dying owner's futex had waiters marked at the time of death. While this may be correct behavior, consider if this handles the case where waiters are added between reading uval and the CAS succeeding.

In Linux's implementation, after setting FUTEX_OWNER_DIED, it performs a wake if there might be waiters to handle this race. Consider checking the updated value or always attempting a wake (which is cheap if no waiters exist).

Suggested change
if mval & FUTEX_WAITERS != 0 {
let mut flags = FutexFlag::FLAGS_MATCH_NONE;
flags.insert(FutexFlag::FLAGS_SHARED);
// 唤醒操作可能会失败,但不影响流程
let _ = Futex::futex_wake(futex_uaddr, flags, 1, FUTEX_BITSET_MATCH_ANY);
}
let mut flags = FutexFlag::FLAGS_MATCH_NONE;
flags.insert(FutexFlag::FLAGS_SHARED);
// 唤醒操作可能会失败,但不影响流程
let _ = Futex::futex_wake(futex_uaddr, flags, 1, FUTEX_BITSET_MATCH_ANY);

Copilot uses AI. Check for mistakes.
/// 检查是否到达链表末尾(entry 指回 head.list)
fn is_sentinel(&self) -> bool {
// 链表的哨兵是 &head.list,其地址就是 head.uaddr
// 因为 list 是 head 结构的第一个字段
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The comment states "链表的哨兵是 &head.list,其地址就是 head.uaddr,因为 list 是 head 结构的第一个字段" (The sentinel of the linked list is &head.list, its address is head.uaddr, because list is the first field of the head structure).

This assumption is only valid in Rust with #[repr(C)] and when list is indeed the first field. Looking at PosixRobustListHead definition, it has #[repr(C)] and list: PosixRobustList is the first field, so this is correct. However, the comment could be clearer about the dependency on this representation.

Suggested change
// 因为 list 是 head 结构的第一个字段
// 这个假设依赖于 head 结构体使用 #[repr(C)] 并且 list 是第一个字段

Copilot uses AI. Check for mistakes.
- 修复在get_robust_list中正确处理未设置robust list的情况
- 在exit_robust_list中重新从用户空间读取最新robust list内容
- 使用原子操作替换原有的非原子读写操作
- 修复FutexIterator中的链表遍历逻辑

Signed-off-by: longjin <longjin@DragonOS.org>
@fslongjin fslongjin force-pushed the fix-robust-futex-handling branch from bd46d3f to 0c4a584 Compare November 13, 2025 07:57
@fslongjin fslongjin merged commit 405aed3 into DragonOS-Community:master Nov 13, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug fix A bug is fixed in this pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants