Thread safety per request only by cavusmustafa · Pull Request #70 · ravi9/llama.cpp

cavusmustafa · 2026-03-16T23:06:59Z

Creates a unique mutex for each cached decoder
Fixes the issue creating two cache entries of decoder in multithreading even if they generate the same key for decoder cache.
Removed the mutex lock in create_weight_nodes function. Thread safety test seems to be passing fine but need to see results for full CI test.

Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>

…d node retrieval inside guarded block to prevent missing-key access

…embedding Fix VIEW op, which slice the input node

…miss Fix missing issue key handling

Fix for stateful execution bug in llama-bench

Copilot

Pull request overview

This PR adjusts the OpenVINO backend’s decoder caching and locking strategy to reduce cross-request contention by moving from a single global mutex to per-decoder (per-cache-entry) mutexes, and removes the global lock previously used during weight-node creation.

Changes:

Introduces decoder_runtime_ctx to pair each cached decoder with its own mutex and updates decoder_cache to store this context.
Updates dynamic/static graph compute paths to lock per-decoder mutexes instead of a single runtime mutex.
Removes the static weights_mutex from GgmlOvDecoder::create_weight_nodes().

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
`ggml/src/ggml-openvino/utils.h`	Changes decoder cache value type to include per-entry mutex + decoder pointer.
`ggml/src/ggml-openvino/utils.cpp`	Reworks cache access/locking to use per-entry mutexes and updates cache writes accordingly.
`ggml/src/ggml-openvino/ggml-decoder.cpp`	Removes the global mutex guarding weight-node creation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

ggml/src/ggml-openvino/utils.h

+struct decoder_runtime_ctx {
+    decoder_runtime_ctx(std::shared_ptr<std::mutex> mutex) :
+        mutex(mutex) {}
+    std::shared_ptr<std::mutex> mutex;
+    std::shared_ptr<GgmlOvDecoder> ptr;


ggml/src/ggml-openvino/utils.cpp

+        std::shared_ptr<std::mutex> mutex;

        auto it = r_ctx->decoder_cache.find(key);

        cache_hit = it != r_ctx->decoder_cache.end();
        ModelParams old_m_params;
        if (cache_hit) {
-            ggml_decoder = it->second;
+            mutex = it->second->mutex;
+            std::lock_guard<std::mutex> lock(*(mutex));
+            ggml_decoder = it->second->ptr;
            old_m_params = ggml_decoder->get_model_params();
            cache_hit = old_m_params.can_reuse_dynamically(m_params);
+        } else {
+            mutex = std::make_shared<std::mutex>();
+            r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex);
        }


ggml/src/ggml-openvino/utils.cpp

        if (cache_hit) {
-            ggml_decoder = it->second;
+            mutex = it->second->mutex;
+            std::lock_guard<std::mutex> lock(*(mutex));
+            ggml_decoder = it->second->ptr;
            old_m_params = ggml_decoder->get_model_params();
            cache_hit = old_m_params.can_reuse_dynamically(m_params);
+        } else {
+            mutex = std::make_shared<std::mutex>();
+            r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex);
        }
+        std::lock_guard<std::mutex> lock(*(mutex));



ggml/src/ggml-openvino/utils.cpp

    if (cache_hit) {
-        ggml_decoder = it->second;
+        ggml_decoder = it->second->ptr;
+        mutex = it->second->mutex;
        old_m_params = ggml_decoder->get_model_params();
        cache_hit = old_m_params.can_reuse_statically(m_params);
+    } else {
+        mutex = std::make_shared<std::mutex>();
+        r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex);
    }
+    std::lock_guard<std::mutex> lock(*(mutex));


ggml/src/ggml-openvino/utils.cpp

                }
            }
        }
+        infer_request->wait();


ggml/src/ggml-openvino/ggml-decoder.cpp

 std::map<std::string, std::shared_ptr<ov::Node>> GgmlOvDecoder::create_weight_nodes(ggml_cgraph * cgraph, bool naive) {
-    static std::mutex weights_mutex;
-    std::lock_guard<std::mutex> lock(weights_mutex);
-
    std::map<std::string, std::shared_ptr<ov::Node>> model_weights;
    auto * nodes = cgraph->nodes;


wine99 · 2026-03-17T02:37:35Z

LGTM. Let's wait for the internal CI is setup and see the CI result

cavusmustafa · 2026-03-17T20:49:51Z

It seems like one thread safety test is failing. I will check and update.

cavusmustafa · 2026-03-17T23:01:52Z

Too many conflicts to resolve so it seems to be safer to create a clean PR. I will close this one and check if CI passes for the new one: #73

wine99 and others added 30 commits January 15, 2026 10:20

Add cgraph tensor output name to OV op name

8ce5cc5

Update openvino build instructions

3051d5a

Add initial NPU support

7fec223

draft NPU support version 2: prefill + kvcache

34531ab

NPU support version 2: prefill + kvcache

d9ca8f5

Change due to ggml cgraph changes, not correct yet

f7ad779

Change due to ggml cgraph changes, llama-3.2 CPU work

592d7f8

Add AMD64 to CMakeLists

e27738a

Change due to ggml cgraph changes, all device work

42d4240

Refactor: clean, fix warning

593484c

Update clang-format

8afee79

Statful transformation for CPU GPU

4c582ac

Add SwiGLU

73ee84f

Fuse to SDPA

ebc4fc9

Replace Concat with Broadcast in MulMat for GQA

bf5414c

Pull out indices creation for kv cache update

acf358d

Refactor: remove past_token_len from extra_inputs

0fa7a5e

Fix Phi3 SwiGLU and SoftMax

3533c14

Pull out sin cos from rope

a80da69

Reduce memory: free ov weights node after graph conversion

f3c0519

Fix CPY due to cgraph change

d61f83c

Added OpenVINO CI/CD. Updated docs

ea75772

Fix llama-cli

1ed49bb

Fix Phi3 ROPE; Add test-backend-ops

44f4cf3

Fix NPU

6dc4b90

Fix llama-bench; Clang-format

75eec62

Fix llama-perplexity

4e7f04a

temp. changes for mark decomp

9cf56d6

matmul in fp32

01cdf4a

mulmat input conversion fix

e2fdc1b

cavusmustafa and others added 17 commits March 6, 2026 16:10

Fix for stateful execution bug in llama-bench

eae534e

Minor updates after stateful llama-bench fix

c2c4211

Update ggml/src/ggml-openvino/utils.cpp

7b93c50

Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>

Remove multiple get_shape calls

29c217a

Bring back mutex into compute

8616f12

Fix VIEW op, which slice the input node

0480c2c

Added token_len_per_seq existence check before slicing masks and move…

f5304c6

…d node retrieval inside guarded block to prevent missing-key access

Merge pull request ravi9#60 from zhaixuejun1993/xuejun/hot-fix-llama-…

481d938

…embedding Fix VIEW op, which slice the input node

Temp. fix for test requant errors

e646c85

Merge pull request ravi9#62 from zhaixuejun1993/xuejun/fix_issue_key_…

1cf0716

…miss Fix missing issue key handling

Merge pull request ravi9#58 from cavusmustafa/fix_stateful_state_sync

409cc8e

Fix for stateful execution bug in llama-bench

Update to OV ggml-ci to low-perf

bb40ee8

ci : temporary disable "test-llama-archs"

0aaf8ab

ci : cache v4 -> v5, checkout v4 -> v6, fix runner tag

e73b4d4

docs : update url

5237965

Fix OV link in docker and Update docs

996b739

Thread safety per request only

ac8c6b0

cavusmustafa requested a review from wine99 as a code owner March 16, 2026 23:07

github-actions bot added ggml OpenVINO labels Mar 16, 2026

cavusmustafa requested review from Copilot and wine99 and removed request for wine99 March 16, 2026 23:33

Copilot started reviewing on behalf of cavusmustafa March 16, 2026 23:34 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

wine99 force-pushed the dev_backend_openvino branch from 996b739 to b6c83aa Compare March 17, 2026 02:25

cavusmustafa closed this Mar 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread safety per request only#70

Thread safety per request only#70
cavusmustafa wants to merge 321 commits intoravi9:dev_backend_openvinofrom
cavusmustafa:cached_thread_safety

cavusmustafa commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

wine99 commented Mar 17, 2026

Uh oh!

cavusmustafa commented Mar 17, 2026

Uh oh!

cavusmustafa commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

cavusmustafa commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

wine99 commented Mar 17, 2026

Uh oh!

cavusmustafa commented Mar 17, 2026

Uh oh!

cavusmustafa commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants