Performance improvements to use RWLock to access LRUQueryCache#13306
Performance improvements to use RWLock to access LRUQueryCache#13306benwtrent merged 15 commits intoapache:mainfrom
Conversation
|
This optimization also benefits high-cost querys such as terms query with 10000 terms, by reading cache more frequently instead of searching inverted index
|
| LeafCache(Object key) { | ||
| this.key = key; | ||
| cache = new IdentityHashMap<>(); | ||
| cache = Collections.synchronizedMap(new IdentityHashMap<>()); |
There was a problem hiding this comment.
I don't understand this. Aren't all accesses to LeafCache protected by the LRUQueryCache read/write locks?
Since LeafCache isn't a static class, it should have access to the enclosing class's lock.
For testing safety, putIfAbsent, remove, onDocIdSetCache, and onDocIdSetEviction should all do a assert writeLock.isHeldByCurrentThread();
There was a problem hiding this comment.
Sorry, I misunderstood here before. Since there is a read lock, concurrent access to IdentityHashMap does not require additional synchronized lock. I optimized the code as per your suggestion.
|
@boicehuang what are the new numbers for your benchmarks for the current iteration? Indeed we will be synchronizing more, so I wonder if we will still see improvement. |
It still has benefits for high-frequency queries with low cardinality fields. |
jpountz
left a comment
There was a problem hiding this comment.
Query caching in Lucene is designed to update the content of the cache infrequently (e.g. because we wait until we've seen a query multiple times before caching it), so it makes sense that switching to a RWLock reduced contention.
The change looks correct, I just left minor comments.
| private volatile long hitCount; | ||
| private volatile long missCount; | ||
| private volatile LongAdder hitCount; | ||
| private volatile LongAdder missCount; |
There was a problem hiding this comment.
These fields could be made final and non volative now.
|
|
||
| // If the lock is already busy, prefer using the uncached version than waiting | ||
| if (lock.tryLock() == false) { | ||
| if (writeLock.tryLock() == false) { |
There was a problem hiding this comment.
Why are we using the write lock here? It looks like we're only reading so we could use the read lock?
There was a problem hiding this comment.
Yes, read lock can also be used here
|
Improvement number of currently submitted code version.
|
benwtrent
left a comment
There was a problem hiding this comment.
I think this is ready for merging. I can do the merging, but won't back port to 9x until we see nightlies. They might catch something we missed.
@boicehuang could you add a changes entry to 9.11 under optimizations?
Finished |
Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields. Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields. One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high, which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache. So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,
Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields.
Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields.
One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high, which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache.
So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,
I benchmarked this optimization by mocking some random LongPoint and querying them with one PointInSetQuery with bool filter.
I think this change can help filter queries that need to query low-cardinality fields.