Skip to content

Fix race condition in prolog query subscriptions and resolve deadlock issues#607

Merged
lucksus merged 7 commits intodevfrom
fix-omitted-query-subscription-updates
Jun 11, 2025
Merged

Fix race condition in prolog query subscriptions and resolve deadlock issues#607
lucksus merged 7 commits intodevfrom
fix-omitted-query-subscription-updates

Conversation

@lucksus
Copy link
Member

@lucksus lucksus commented Jun 9, 2025

Fix race condition in prolog query subscriptions and resolve deadlock issues

Problem

Original Issue: Race Condition in Prolog Query Subscriptions

Users reported a bug in Flux where new messages would sometimes not appear in the UI until a second message was sent by either party. Investigation revealed that while the message links were correctly stored in the perspective, the UI wasn't being notified of the changes due to missed prolog query subscription updates.

Root Cause: A race condition between two asynchronous operations:

  1. Prolog facts update - spawn_prolog_facts_update() spawns an async task to update prolog facts
  2. Subscription checks - trigger_prolog_subscription_check was set immediately after calling spawn_prolog_facts_update()

The problematic sequence:

  • Message arrives → add_link_expression() called
  • spawn_prolog_facts_update() (async) ← starts updating prolog facts
  • trigger_prolog_subscription_check = true (immediate)
  • check_subscribed_queries() runs (before prolog facts are updated)
  • Query runs against OLD prolog facts → no change detected → no UI update
  • Prolog facts update completes (too late)

Secondary Issue: Deadlock Conditions

After implementing the race condition fix, new deadlock issues emerged due to locks being held during long-running operations. Multiple background tasks and user operations could deadlock when:

  • Locks were held during expensive database operations
  • Locks were held during network calls (pubsub, prolog service)
  • Multiple locks were acquired in inconsistent order
  • Trigger locks were held during potentially long processing operations

Solution

Race Condition Fix

Approach: Move subscription and notification triggers into the async spawn_prolog_facts_update task itself, ensuring they always execute after prolog facts are updated.

Changes:

  1. Modified spawn_prolog_facts_update(): Added trigger setting at the end of the async task
  2. Removed manual triggers: Eliminated all manual trigger_prolog_subscription_check and trigger_notification_check from calling methods
  3. Simplified calling code: All methods now just call spawn_prolog_facts_update() without coordination

Deadlock Prevention Fixes

Implemented comprehensive deadlock prevention by applying the principle of minimal lock scope:

1. subscribed_queries_loop()

  • Issue: Trigger lock held during potentially long check_subscribed_queries() operation
  • Fix: Check trigger value in scoped block, release lock before operation

2. pubsub_publish_diff()

  • Issue: persisted lock held during slow pubsub network operations
  • Fix: Extract handle data in scoped block, release lock before pubsub calls

3. commit_batch()

  • Issue: batch_store write lock held during entire operation including database calls
  • Fix: Extract diff in scoped block, release lock before processing

4. spawn_prolog_facts_update()

  • Issue: prolog_update_mutex write lock held during entire operation including service calls
  • Fix:
    • Get UUID before acquiring locks
    • For simple additions: hold write lock only during prolog service call
    • For fact rebuilds: maintain lock for consistency as needed

5. update_prolog_engine_facts()

  • Issue: Multiple persisted lock acquisitions during service operations
  • Fix: Collect all required data before making service calls

6. calc_notification_trigger_matches()

  • Issue: persisted lock held during prolog query operations
  • Fix: Extract UUID in scoped block before executing queries

7. subscribe_and_query()

  • Issue: subscribed_queries lock held during prolog query execution
  • Fix: Execute prolog query without locks, then insert subscription

8. no_link_language_error()

  • Issue: persisted lock held during error message construction
  • Fix: Extract required data before constructing error

Key Design Principles Applied

  1. Minimal Lock Scope: Hold locks only for the minimal time needed to access shared data
  2. Extract-Then-Process: Get required data from locked structures in scoped blocks, then release locks before expensive operations
  3. Avoid Nested Locks: Don't call methods that might acquire locks while already holding locks
  4. Separate Critical Sections: Split operations into critical sections (need locks) and non-critical sections (expensive operations)

Testing

  • Code compiles successfully
  • All existing tests pass
  • Race condition eliminated: subscription updates now consistently fire after prolog facts updates
  • Deadlock potential eliminated: locks released before expensive operations
  • Confirm subscriptions work without problems in Flux

Impact

  • Fixes the core bug: New messages in Flux now appear immediately in the UI
  • Improves reliability: Eliminates deadlock conditions that could freeze the application
  • Performance improvement: Reduced lock contention and better concurrency
  • Maintainability: Clearer separation of concerns and more predictable execution order

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 9, 2025

Walkthrough

The update consolidates the setting of notification and subscription check flags within the asynchronous task that updates Prolog facts. Previously, these flags were set in multiple methods after publishing diffs or spawning commits. Now, they are set only after the Prolog update completes, reducing redundant flag assignments. Additionally, lock durations on shared state were minimized to improve concurrency. Separately, two ICE server entries in the WebRTC configuration were commented out without altering any logic.

Changes

File(s) Change Summary
rust-executor/src/perspectives/perspective_instance.rs Centralized notification/subscription flag setting to post-Prolog update async task; reduced lock contention by minimizing lock durations and cloning handles outside locks.
rust-executor/src/holochain_service/mod.rs Commented out two ICE server entries in the WebRTC configuration, disabling those STUN servers without other logic changes.

Poem

A bunny hopped through code so neat,
Tidying flags from every seat.
No more toggles here and there—
Now they're set with thoughtful care.
When Prolog facts are fresh and done,
The checks begin, all tasks as one!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ac1786c and 16ec610.

📒 Files selected for processing (1)
  • rust-executor/src/holochain_service/mod.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • rust-executor/src/holochain_service/mod.rs
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@lucksus lucksus changed the title Fix race condition in prolog query subscriptions causing missed UI updates Fix race condition in prolog query subscriptions and resolve deadlock issues Jun 10, 2025
@lucksus lucksus merged commit 5511a43 into dev Jun 11, 2025
3 checks passed
@lucksus lucksus deleted the fix-omitted-query-subscription-updates branch August 22, 2025 12:18
@coderabbitai coderabbitai bot mentioned this pull request Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants