Skip to content

[BUG] ShardNotFoundException during IndicesRequestCache clean up #14190

@sgup432

Description

@sgup432

Describe the bug

We observed a bug in 2.13/2.14 where during RequestCache clean up, below stracktrace can be seen:

Exception during periodic indices request cache cleanup:
<shard_name> ShardNotFoundException[no such shard]
        at org.opensearch.index.IndexService.getShard(IndexService.java:351)
        at org.opensearch.indices.IndicesService.lambda$new$0(IndicesService.java:431)
        at org.opensearch.indices.IndicesRequestCache$IndicesRequestCacheCleanupManager.cleanCache(IndicesRequestCache.java:658)
        at org.opensearch.indices.IndicesRequestCache$IndicesRequestCacheCleanupManager.cleanCache(IndicesRequestCache.java:609)
        at org.opensearch.indices.IndicesRequestCache$IndicesRequestCacheCleanupManager$IndicesRequestCacheCleaner.run(IndicesRequestCache.java:737)
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:863)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)

This happens in case when we are trying to clean up the cached entries for an index shard which got allocated to another node or deleted from that node. And the above exception is thrown from here - https://github.com/opensearch-project/OpenSearch/blame/main/server/src/main/java/org/opensearch/indices/IndicesService.java#L407

Related component

Search:Performance

To Reproduce

  • Cache entries in request cache for indexShard A for node-1
  • Move indexShard A to another node-2
  • Try clearing up cache entries for node-1

Expected behavior

We should not see these exceptions during cache clean up. As it will then fail to clear up the stale entries from cache and thereby disallowing the new entries to be cache and causing performance impact indirectly.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingv2.15.0Issues and PRs related to version 2.15.0

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions