-
Notifications
You must be signed in to change notification settings - Fork 24
[BUG] Hot Path Performance Regression #150
Description
What is the bug?
The current state of bufferpool implemention has 1.5 to 2x regression, especially for search requests that look up many docIds, e.g.
1. cardinality-agg-high: 316 → 914 (1.8x)
2. multi-terms: 9.7s → 24s (2.5x)
For full report see below.
Using async profile, the culprit pretty clear:
Profiling shows BlockSlotTinyCache.acquireRefCountedValue taking majority of the time. This was added in this PR to solve memory leak issue. Also confirmed by not unpinning the memory segment (that leads to memory leak).
How can one reproduce the bug?
Run any osb workload with storage type cryptofs
What is the expected behavior?
Same performance mmap or better.
What is your host/environment?
- OS: Linux
- Version 3.3.1
- Plugins opensearch-storage-encryption
INSTANCE_TYPE="r5.2xlarge"
VOLUME_SIZE=500
VOLUME_TYPE="gp3"
IOPS=16000
THROUGHPUT=1000
Do you have any screenshots?
See above
Do you have any additional context?
Full big5 run, p50
| Operation | Metric | Baseline (mmap) | Challenger (bufferpool) | Delta % |
|---|---|---|---|---|
| index-append | Mean Throughput (docs/s) | 62,717 | 56,328 | -10.2% 🔴 |
| index-append | p50 Latency (ms) | 15.40 | 15.75 | +2.3% |
| match-all | p50 Service Time (ms) | 4.28 | 3.27 | -23.6% 🟢 |
| desc_sort_timestamp | p50 Service Time (ms) | 5.35 | 6.56 | +22.6% 🔴 |
| asc_sort_timestamp | p50 Service Time (ms) | 5.26 | 6.89 | +31.2% 🔴 |
| desc_sort_with_after_timestamp | p50 Service Time (ms) | 3.72 | 5.00 | +34.2% 🔴 |
| asc_sort_with_after_timestamp | p50 Service Time (ms) | 4.33 | 4.59 | +6.1% |
| desc_sort_timestamp_can_match_shortcut | p50 Service Time (ms) | 8.96 | 5.97 | -33.4% 🟢 |
| desc_sort_timestamp_no_can_match_shortcut | p50 Service Time (ms) | 8.98 | 5.65 | -37.1% 🟢 |
| asc_sort_timestamp_can_match_shortcut | p50 Service Time (ms) | 6.54 | 5.85 | -10.5% 🟢 |
| asc_sort_timestamp_no_can_match_shortcut | p50 Service Time (ms) | 6.48 | 5.83 | -10.0% 🟢 |
| term | p50 Service Time (ms) | 3.26 | 3.57 | +9.6% |
| multi_terms-keyword | p50 Service Time (ms) | 141.51 | 263.74 | +86.3% 🔴 |
| keyword-terms | p50 Service Time (ms) | 28.59 | 34.09 | +19.3% 🟡 |
| keyword-terms-low-cardinality | p50 Service Time (ms) | 23.28 | 27.95 | +20.1% 🔴 |
| composite-terms | p50 Service Time (ms) | 38.84 | 99.05 | +155.1% 🔴 |
| composite_terms-keyword | p50 Service Time (ms) | 63.99 | 169.39 | +164.7% 🔴 |
| composite-date_histogram-daily | p50 Service Time (ms) | 3.47 | 3.95 | +13.8% 🟡 |
| range | p50 Service Time (ms) | 4.14 | 4.57 | +10.4% 🟡 |
| range-numeric | p50 Service Time (ms) | 1.47 | 1.39 | -5.3% |
| keyword-in-range | p50 Service Time (ms) | 10.97 | 31.37 | +186.1% 🔴 |
| date_histogram_hourly_agg | p50 Service Time (ms) | 5.69 | 7.65 | +34.3% 🔴 |
| date_histogram_hourly_with_filter_agg | p50 Service Time (ms) | 76.51 | 186.90 | +144.3% 🔴 |
| date_histogram_minute_agg | p50 Service Time (ms) | 32.65 | 51.66 | +58.2% 🔴 |
| scroll | p50 Service Time (ms) | 376.41 | 372.70 | -1.0% |
| query-string-on-message | p50 Service Time (ms) | 4.42 | 4.75 | +7.5% |
| query-string-on-message-filtered | p50 Service Time (ms) | 12.03 | 29.69 | +146.8% 🔴 |
| query-string-on-message-filtered-sorted-num | p50 Service Time (ms) | 13.17 | 26.93 | +104.5% 🔴 |
| sort_keyword_can_match_shortcut | p50 Service Time (ms) | 3.42 | 3.69 | +7.8% |
| sort_keyword_no_can_match_shortcut | p50 Service Time (ms) | 3.23 | 3.66 | +13.4% 🟡 |
| sort_numeric_desc | p50 Service Time (ms) | 3.63 | 4.51 | +24.3% 🔴 |
| sort_numeric_asc | p50 Service Time (ms) | 3.35 | 3.56 | +6.2% |
| sort_numeric_desc_with_match | p50 Service Time (ms) | 1.63 | 1.60 | -2.0% |
| sort_numeric_asc_with_match | p50 Service Time (ms) | 1.67 | 1.60 | -3.9% |
| range_field_conjunction_big_range_big_term | p50 Service Time (ms) | 1.36 | 1.31 | -3.9% |
| range_field_disjunction_big_range_small_term | p50 Service Time (ms) | 1.58 | 1.54 | -2.6% |
| range_field_conjunction_small_range_small_term | p50 Service Time (ms) | 1.45 | 1.60 | +9.9% |
| range_field_conjunction_small_range_big_term | p50 Service Time (ms) | 1.24 | 1.22 | -1.2% |
| range-auto-date-histo | p50 Service Time (ms) | 482.53 | 1,385.98 | +187.2% 🔴 |
| range-with-metrics | p50 Service Time (ms) | 1,755.76 | 4,898.02 | +178.9% 🔴 |
| range-auto-date-histo-with-metrics | p50 Service Time (ms) | 2,011.74 | 4,911.37 | +144.1% 🔴 |
| range-agg-1 | p50 Service Time (ms) | 1.75 | 1.77 | +1.1% |
| range-agg-2 | p50 Service Time (ms) | 1.78 | 1.86 | +4.8% |
| cardinality-agg-low | p50 Service Time (ms) | 4.13 | 3.52 | -14.9% 🟢 |
| cardinality-agg-high | p50 Service Time (ms) | 316.37 | 914.28 | +189.0% 🔴 |
| +------------------------------------------------+--------------------------+-----------------+-------------------------+------------+ |
Metadata
Metadata
Assignees
Labels
Type
Projects
Status