Skip to content

[BUG] Hot Path Performance Regression #150

@asimmahmood1

Description

@asimmahmood1

What is the bug?

The current state of bufferpool implemention has 1.5 to 2x regression, especially for search requests that look up many docIds, e.g.

1. cardinality-agg-high: 316 → 914 (1.8x)
2. multi-terms: 9.7s → 24s (2.5x)

For full report see below.

Using async profile, the culprit pretty clear:

Image

Profiling shows BlockSlotTinyCache.acquireRefCountedValue taking majority of the time. This was added in this PR to solve memory leak issue. Also confirmed by not unpinning the memory segment (that leads to memory leak).

How can one reproduce the bug?
Run any osb workload with storage type cryptofs

What is the expected behavior?
Same performance mmap or better.

What is your host/environment?

  • OS: Linux
  • Version 3.3.1
  • Plugins opensearch-storage-encryption

INSTANCE_TYPE="r5.2xlarge"
VOLUME_SIZE=500
VOLUME_TYPE="gp3"
IOPS=16000
THROUGHPUT=1000

Do you have any screenshots?
See above

Do you have any additional context?
Full big5 run, p50

Operation Metric Baseline (mmap) Challenger (bufferpool) Delta %
index-append Mean Throughput (docs/s) 62,717 56,328 -10.2% 🔴
index-append p50 Latency (ms) 15.40 15.75 +2.3%
match-all p50 Service Time (ms) 4.28 3.27 -23.6% 🟢
desc_sort_timestamp p50 Service Time (ms) 5.35 6.56 +22.6% 🔴
asc_sort_timestamp p50 Service Time (ms) 5.26 6.89 +31.2% 🔴
desc_sort_with_after_timestamp p50 Service Time (ms) 3.72 5.00 +34.2% 🔴
asc_sort_with_after_timestamp p50 Service Time (ms) 4.33 4.59 +6.1%
desc_sort_timestamp_can_match_shortcut p50 Service Time (ms) 8.96 5.97 -33.4% 🟢
desc_sort_timestamp_no_can_match_shortcut p50 Service Time (ms) 8.98 5.65 -37.1% 🟢
asc_sort_timestamp_can_match_shortcut p50 Service Time (ms) 6.54 5.85 -10.5% 🟢
asc_sort_timestamp_no_can_match_shortcut p50 Service Time (ms) 6.48 5.83 -10.0% 🟢
term p50 Service Time (ms) 3.26 3.57 +9.6%
multi_terms-keyword p50 Service Time (ms) 141.51 263.74 +86.3% 🔴
keyword-terms p50 Service Time (ms) 28.59 34.09 +19.3% 🟡
keyword-terms-low-cardinality p50 Service Time (ms) 23.28 27.95 +20.1% 🔴
composite-terms p50 Service Time (ms) 38.84 99.05 +155.1% 🔴
composite_terms-keyword p50 Service Time (ms) 63.99 169.39 +164.7% 🔴
composite-date_histogram-daily p50 Service Time (ms) 3.47 3.95 +13.8% 🟡
range p50 Service Time (ms) 4.14 4.57 +10.4% 🟡
range-numeric p50 Service Time (ms) 1.47 1.39 -5.3%
keyword-in-range p50 Service Time (ms) 10.97 31.37 +186.1% 🔴
date_histogram_hourly_agg p50 Service Time (ms) 5.69 7.65 +34.3% 🔴
date_histogram_hourly_with_filter_agg p50 Service Time (ms) 76.51 186.90 +144.3% 🔴
date_histogram_minute_agg p50 Service Time (ms) 32.65 51.66 +58.2% 🔴
scroll p50 Service Time (ms) 376.41 372.70 -1.0%
query-string-on-message p50 Service Time (ms) 4.42 4.75 +7.5%
query-string-on-message-filtered p50 Service Time (ms) 12.03 29.69 +146.8% 🔴
query-string-on-message-filtered-sorted-num p50 Service Time (ms) 13.17 26.93 +104.5% 🔴
sort_keyword_can_match_shortcut p50 Service Time (ms) 3.42 3.69 +7.8%
sort_keyword_no_can_match_shortcut p50 Service Time (ms) 3.23 3.66 +13.4% 🟡
sort_numeric_desc p50 Service Time (ms) 3.63 4.51 +24.3% 🔴
sort_numeric_asc p50 Service Time (ms) 3.35 3.56 +6.2%
sort_numeric_desc_with_match p50 Service Time (ms) 1.63 1.60 -2.0%
sort_numeric_asc_with_match p50 Service Time (ms) 1.67 1.60 -3.9%
range_field_conjunction_big_range_big_term p50 Service Time (ms) 1.36 1.31 -3.9%
range_field_disjunction_big_range_small_term p50 Service Time (ms) 1.58 1.54 -2.6%
range_field_conjunction_small_range_small_term p50 Service Time (ms) 1.45 1.60 +9.9%
range_field_conjunction_small_range_big_term p50 Service Time (ms) 1.24 1.22 -1.2%
range-auto-date-histo p50 Service Time (ms) 482.53 1,385.98 +187.2% 🔴
range-with-metrics p50 Service Time (ms) 1,755.76 4,898.02 +178.9% 🔴
range-auto-date-histo-with-metrics p50 Service Time (ms) 2,011.74 4,911.37 +144.1% 🔴
range-agg-1 p50 Service Time (ms) 1.75 1.77 +1.1%
range-agg-2 p50 Service Time (ms) 1.78 1.86 +4.8%
cardinality-agg-low p50 Service Time (ms) 4.13 3.52 -14.9% 🟢
cardinality-agg-high p50 Service Time (ms) 316.37 914.28 +189.0% 🔴
+------------------------------------------------+--------------------------+-----------------+-------------------------+------------+

Metadata

Metadata

Labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions