Thread safety per request only#70
Thread safety per request only#70cavusmustafa wants to merge 321 commits intoravi9:dev_backend_openvinofrom
Conversation
Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>
…d node retrieval inside guarded block to prevent missing-key access
…embedding Fix VIEW op, which slice the input node
…miss Fix missing issue key handling
Fix for stateful execution bug in llama-bench
There was a problem hiding this comment.
Pull request overview
This PR adjusts the OpenVINO backend’s decoder caching and locking strategy to reduce cross-request contention by moving from a single global mutex to per-decoder (per-cache-entry) mutexes, and removes the global lock previously used during weight-node creation.
Changes:
- Introduces
decoder_runtime_ctxto pair each cached decoder with its own mutex and updatesdecoder_cacheto store this context. - Updates dynamic/static graph compute paths to lock per-decoder mutexes instead of a single runtime mutex.
- Removes the static
weights_mutexfromGgmlOvDecoder::create_weight_nodes().
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
ggml/src/ggml-openvino/utils.h |
Changes decoder cache value type to include per-entry mutex + decoder pointer. |
ggml/src/ggml-openvino/utils.cpp |
Reworks cache access/locking to use per-entry mutexes and updates cache writes accordingly. |
ggml/src/ggml-openvino/ggml-decoder.cpp |
Removes the global mutex guarding weight-node creation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| struct decoder_runtime_ctx { | ||
| decoder_runtime_ctx(std::shared_ptr<std::mutex> mutex) : | ||
| mutex(mutex) {} | ||
| std::shared_ptr<std::mutex> mutex; | ||
| std::shared_ptr<GgmlOvDecoder> ptr; |
| std::shared_ptr<std::mutex> mutex; | ||
|
|
||
| auto it = r_ctx->decoder_cache.find(key); | ||
|
|
||
| cache_hit = it != r_ctx->decoder_cache.end(); | ||
| ModelParams old_m_params; | ||
| if (cache_hit) { | ||
| ggml_decoder = it->second; | ||
| mutex = it->second->mutex; | ||
| std::lock_guard<std::mutex> lock(*(mutex)); | ||
| ggml_decoder = it->second->ptr; | ||
| old_m_params = ggml_decoder->get_model_params(); | ||
| cache_hit = old_m_params.can_reuse_dynamically(m_params); | ||
| } else { | ||
| mutex = std::make_shared<std::mutex>(); | ||
| r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex); | ||
| } |
| if (cache_hit) { | ||
| ggml_decoder = it->second; | ||
| mutex = it->second->mutex; | ||
| std::lock_guard<std::mutex> lock(*(mutex)); | ||
| ggml_decoder = it->second->ptr; | ||
| old_m_params = ggml_decoder->get_model_params(); | ||
| cache_hit = old_m_params.can_reuse_dynamically(m_params); | ||
| } else { | ||
| mutex = std::make_shared<std::mutex>(); | ||
| r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex); | ||
| } | ||
| std::lock_guard<std::mutex> lock(*(mutex)); | ||
|
|
| if (cache_hit) { | ||
| ggml_decoder = it->second; | ||
| ggml_decoder = it->second->ptr; | ||
| mutex = it->second->mutex; | ||
| old_m_params = ggml_decoder->get_model_params(); | ||
| cache_hit = old_m_params.can_reuse_statically(m_params); | ||
| } else { | ||
| mutex = std::make_shared<std::mutex>(); | ||
| r_ctx->decoder_cache[key] = std::make_shared<decoder_runtime_ctx>(mutex); | ||
| } | ||
| std::lock_guard<std::mutex> lock(*(mutex)); |
| } | ||
| } | ||
| } | ||
| infer_request->wait(); |
| std::map<std::string, std::shared_ptr<ov::Node>> GgmlOvDecoder::create_weight_nodes(ggml_cgraph * cgraph, bool naive) { | ||
| static std::mutex weights_mutex; | ||
| std::lock_guard<std::mutex> lock(weights_mutex); | ||
|
|
||
| std::map<std::string, std::shared_ptr<ov::Node>> model_weights; | ||
| auto * nodes = cgraph->nodes; |
996b739 to
b6c83aa
Compare
|
LGTM. Let's wait for the internal CI is setup and see the CI result |
|
It seems like one thread safety test is failing. I will check and update. |
|
Too many conflicts to resolve so it seems to be safer to create a clean PR. I will close this one and check if CI passes for the new one: #73 |
create_weight_nodesfunction. Thread safety test seems to be passing fine but need to see results for full CI test.