Skip to content

Thread safety per request only#70

Closed
cavusmustafa wants to merge 321 commits intoravi9:dev_backend_openvinofrom
cavusmustafa:cached_thread_safety
Closed

Thread safety per request only#70
cavusmustafa wants to merge 321 commits intoravi9:dev_backend_openvinofrom
cavusmustafa:cached_thread_safety

Conversation

@cavusmustafa
Copy link
Copy Markdown
Collaborator

  • Creates a unique mutex for each cached decoder
  • Fixes the issue creating two cache entries of decoder in multithreading even if they generate the same key for decoder cache.
  • Removed the mutex lock in create_weight_nodes function. Thread safety test seems to be passing fine but need to see results for full CI test.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the OpenVINO backend’s decoder caching and locking strategy to reduce cross-request contention by moving from a single global mutex to per-decoder (per-cache-entry) mutexes, and removes the global lock previously used during weight-node creation.

Changes:

  • Introduces decoder_runtime_ctx to pair each cached decoder with its own mutex and updates decoder_cache to store this context.
  • Updates dynamic/static graph compute paths to lock per-decoder mutexes instead of a single runtime mutex.
  • Removes the static weights_mutex from GgmlOvDecoder::create_weight_nodes().

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
ggml/src/ggml-openvino/utils.h Changes decoder cache value type to include per-entry mutex + decoder pointer.
ggml/src/ggml-openvino/utils.cpp Reworks cache access/locking to use per-entry mutexes and updates cache writes accordingly.
ggml/src/ggml-openvino/ggml-decoder.cpp Removes the global mutex guarding weight-node creation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +43 to +47
struct decoder_runtime_ctx {
decoder_runtime_ctx(std::shared_ptr<std::mutex> mutex) :
mutex(mutex) {}
std::shared_ptr<std::mutex> mutex;
std::shared_ptr<GgmlOvDecoder> ptr;
Comment on lines +109 to 124
std::shared_ptr<std::mutex> mutex;

auto it = r_ctx->decoder_cache.find(key);

cache_hit = it != r_ctx->decoder_cache.end();
ModelParams old_m_params;
if (cache_hit) {
ggml_decoder = it->second;
mutex = it->second->mutex;
std::lock_guard<std::mutex> lock(*(mutex));
ggml_decoder = it->second->ptr;
old_m_params = ggml_decoder->get_model_params();
cache_hit = old_m_params.can_reuse_dynamically(m_params);
} else {
mutex = std::make_shared<std::mutex>();
r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex);
}
Comment on lines 115 to 126
if (cache_hit) {
ggml_decoder = it->second;
mutex = it->second->mutex;
std::lock_guard<std::mutex> lock(*(mutex));
ggml_decoder = it->second->ptr;
old_m_params = ggml_decoder->get_model_params();
cache_hit = old_m_params.can_reuse_dynamically(m_params);
} else {
mutex = std::make_shared<std::mutex>();
r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex);
}
std::lock_guard<std::mutex> lock(*(mutex));

Comment on lines 322 to +331
if (cache_hit) {
ggml_decoder = it->second;
ggml_decoder = it->second->ptr;
mutex = it->second->mutex;
old_m_params = ggml_decoder->get_model_params();
cache_hit = old_m_params.can_reuse_statically(m_params);
} else {
mutex = std::make_shared<std::mutex>();
r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex);
}
std::lock_guard<std::mutex> lock(*(mutex));
}
}
}
infer_request->wait();
Comment on lines 574 to 576
std::map<std::string, std::shared_ptr<ov::Node>> GgmlOvDecoder::create_weight_nodes(ggml_cgraph * cgraph, bool naive) {
static std::mutex weights_mutex;
std::lock_guard<std::mutex> lock(weights_mutex);

std::map<std::string, std::shared_ptr<ov::Node>> model_weights;
auto * nodes = cgraph->nodes;
@wine99 wine99 force-pushed the dev_backend_openvino branch from 996b739 to b6c83aa Compare March 17, 2026 02:25
@wine99
Copy link
Copy Markdown
Collaborator

wine99 commented Mar 17, 2026

LGTM. Let's wait for the internal CI is setup and see the CI result

@cavusmustafa
Copy link
Copy Markdown
Collaborator Author

It seems like one thread safety test is failing. I will check and update.

@cavusmustafa
Copy link
Copy Markdown
Collaborator Author

Too many conflicts to resolve so it seems to be safer to create a clean PR. I will close this one and check if CI passes for the new one: #73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants