Fix race condition in prolog query subscriptions and resolve deadlock issues#607
Fix race condition in prolog query subscriptions and resolve deadlock issues#607
Conversation
WalkthroughThe update consolidates the setting of notification and subscription check flags within the asynchronous task that updates Prolog facts. Previously, these flags were set in multiple methods after publishing diffs or spawning commits. Now, they are set only after the Prolog update completes, reducing redundant flag assignments. Additionally, lock durations on shared state were minimized to improve concurrency. Separately, two ICE server entries in the WebRTC configuration were commented out without altering any logic. Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Fix race condition in prolog query subscriptions and resolve deadlock issues
Problem
Original Issue: Race Condition in Prolog Query Subscriptions
Users reported a bug in Flux where new messages would sometimes not appear in the UI until a second message was sent by either party. Investigation revealed that while the message links were correctly stored in the perspective, the UI wasn't being notified of the changes due to missed prolog query subscription updates.
Root Cause: A race condition between two asynchronous operations:
spawn_prolog_facts_update()spawns an async task to update prolog factstrigger_prolog_subscription_checkwas set immediately after callingspawn_prolog_facts_update()The problematic sequence:
add_link_expression()called↓
spawn_prolog_facts_update()(async) ← starts updating prolog facts↓
trigger_prolog_subscription_check = true(immediate)↓
check_subscribed_queries()runs (before prolog facts are updated)↓
↓
Secondary Issue: Deadlock Conditions
After implementing the race condition fix, new deadlock issues emerged due to locks being held during long-running operations. Multiple background tasks and user operations could deadlock when:
Solution
Race Condition Fix
Approach: Move subscription and notification triggers into the async
spawn_prolog_facts_updatetask itself, ensuring they always execute after prolog facts are updated.Changes:
spawn_prolog_facts_update(): Added trigger setting at the end of the async tasktrigger_prolog_subscription_checkandtrigger_notification_checkfrom calling methodsspawn_prolog_facts_update()without coordinationDeadlock Prevention Fixes
Implemented comprehensive deadlock prevention by applying the principle of minimal lock scope:
1.
subscribed_queries_loop()check_subscribed_queries()operation2.
pubsub_publish_diff()persistedlock held during slow pubsub network operations3.
commit_batch()batch_storewrite lock held during entire operation including database calls4.
spawn_prolog_facts_update()prolog_update_mutexwrite lock held during entire operation including service calls5.
update_prolog_engine_facts()persistedlock acquisitions during service operations6.
calc_notification_trigger_matches()persistedlock held during prolog query operations7.
subscribe_and_query()subscribed_querieslock held during prolog query execution8.
no_link_language_error()persistedlock held during error message constructionKey Design Principles Applied
Testing
Impact