Skip to content

Performance improvements to use RWLock to access LRUQueryCache#13306

Merged
benwtrent merged 15 commits intoapache:mainfrom
boicehuang:readlock
May 10, 2024
Merged

Performance improvements to use RWLock to access LRUQueryCache#13306
benwtrent merged 15 commits intoapache:mainfrom
boicehuang:readlock

Conversation

@boicehuang
Copy link
Copy Markdown
Contributor

@boicehuang boicehuang commented Apr 15, 2024

Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields.

Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields.

One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high, which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache.

So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,

I benchmarked this optimization by mocking some random LongPoint and querying them with one PointInSetQuery with bool filter.

doc count field cardinality query terms count baseline QPS candidate QPS diff percentage
30000000 10 1 2481 5102 105.6%
30000000 1000000 1 6396 6596.48 3.1%

I think this change can help filter queries that need to query low-cardinality fields.

@boicehuang
Copy link
Copy Markdown
Contributor Author

This optimization also benefits high-cost querys such as terms query with 10000 terms, by reading cache more frequently instead of searching inverted index

doc count field cardinality query terms count baseline QPS candidate QPS diff percentage
30000000 1000000 10000 160 473 191.9%

LeafCache(Object key) {
this.key = key;
cache = new IdentityHashMap<>();
cache = Collections.synchronizedMap(new IdentityHashMap<>());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Aren't all accesses to LeafCache protected by the LRUQueryCache read/write locks?

Since LeafCache isn't a static class, it should have access to the enclosing class's lock.

For testing safety, putIfAbsent, remove, onDocIdSetCache, and onDocIdSetEviction should all do a assert writeLock.isHeldByCurrentThread();

Copy link
Copy Markdown
Contributor Author

@boicehuang boicehuang Apr 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I misunderstood here before. Since there is a read lock, concurrent access to IdentityHashMap does not require additional synchronized lock. I optimized the code as per your suggestion.

@benwtrent
Copy link
Copy Markdown
Member

@boicehuang what are the new numbers for your benchmarks for the current iteration? Indeed we will be synchronizing more, so I wonder if we will still see improvement.

@boicehuang
Copy link
Copy Markdown
Contributor Author

boicehuang commented May 6, 2024

@boicehuang what are the new numbers for your benchmarks for the current iteration? Indeed we will be synchronizing more, so I wonder if we will still see improvement.

doc count field cardinality query point baseline QPS candidate QPS diff percentage diff
30000000 10 1 2481 4428 78% using LongAdder. uniqueQueries to be Collections.synchronizedMap. cache to be IdentityHashMap.

It still has benefits for high-frequency queries with low cardinality fields.

Copy link
Copy Markdown
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query caching in Lucene is designed to update the content of the cache infrequently (e.g. because we wait until we've seen a query multiple times before caching it), so it makes sense that switching to a RWLock reduced contention.

The change looks correct, I just left minor comments.

private volatile long hitCount;
private volatile long missCount;
private volatile LongAdder hitCount;
private volatile LongAdder missCount;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields could be made final and non volative now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


// If the lock is already busy, prefer using the uncached version than waiting
if (lock.tryLock() == false) {
if (writeLock.tryLock() == false) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the write lock here? It looks like we're only reading so we could use the read lock?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, read lock can also be used here

@boicehuang boicehuang changed the title Performance improvements to use read lock to access LRUQueryCache Performance improvements to use RWLock to access LRUQueryCache May 8, 2024
@boicehuang
Copy link
Copy Markdown
Contributor Author

boicehuang commented May 8, 2024

Improvement number of currently submitted code version.

doc count field cardinality query point baseline QPS candidate QPS diff percentage diff
30000000 10 1 2481 4408 78% using LongAdder. uniqueQueries to be Collections.synchronizedMap. cache to be IdentityHashMap.

Copy link
Copy Markdown
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready for merging. I can do the merging, but won't back port to 9x until we see nightlies. They might catch something we missed.

@boicehuang could you add a changes entry to 9.11 under optimizations?

@boicehuang
Copy link
Copy Markdown
Contributor Author

I think this is ready for merging. I can do the merging, but won't back port to 9x until we see nightlies. They might catch something we missed.

@boicehuang could you add a changes entry to 9.11 under optimizations?

Finished

@benwtrent benwtrent merged commit 5ac88c7 into apache:main May 10, 2024
benwtrent pushed a commit that referenced this pull request May 14, 2024
Elasticsearch (which based on lucene) can automatically infer types for users with its dynamic mapping feature. When users index some low cardinality fields, such as gender / age / status... they often use some numbers to represent the values, while ES will infer these fields as long, and ES uses BKD as the index of long fields. 

Just as #541 said, when the data volume grows, building the result set of low-cardinality fields will make the CPU usage and load very high even if we use a boolean query with filter clauses for low-cardinality fields. 

One reason is that it uses a ReentrantLock to limit accessing LRUQueryCache. QPS and costs of their queries are often high,  which often causes trying locking failures when obtaining the cache, resulting in low concurrency in accessing the cache.

So I replace the ReentrantLock with a ReentrantReadWriteLock. I only use the read lock when I need to get the cache for a query,
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants