-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the bug
Through different runs, we have noticed that force merge operation takes more time and when we deep dived, stored fields especially takes more time in 3.0 OS compared to 2.19 OS.
Other than stored fields, postings format too shows regression ~20%.
Related component
No response
To Reproduce
Ingest data via big5 workload via OSB.
Force merge to 10 segments initially and force merge to 1 segment and enable the trace logs of "SM" and "MS" to enable Lucene infostream logs to get an idea on how much time each format takes.
curl --location --request PUT 'localhost:9200/_cluster/settings' \
--header 'Content-Type: application/json' \
--data-raw '{
"transient": {
"logger.org.opensearch.index.engine.Engine.SM": "TRACE",
"logger.org.opensearch.index.engine.Engine.MS": "TRACE",
"logger.org.opensearch.index.engine.Engine.IW": "TRACE"
}
}'
Do the same in 2.19 OS and 3.0 OS.
Benchmark runs
Merging the following segments to 1 ( forceMerge to 1 )took same amount of time [ except postings ] as they're similar size and all are non compound files
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
big5 0 p 127.0.0.1 _3r 135 23285805 0 4.8gb 0 true true 10.1.0 false
big5 0 p 127.0.0.1 _aj 379 23121472 0 4.8gb 0 true true 10.1.0 false
big5 0 p 127.0.0.1 _cc 444 23137636 0 4.8gb 0 true true 10.1.0 false
big5 0 p 127.0.0.1 _f3 543 23290908 0 4.8gb 0 true true 10.1.0 false
big5 0 p 127.0.0.1 _f5 545 23164179 0 4.8gb 0 true true 10.1.0 false
2.19
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
big5 0 p 127.0.0.1 _3r 135 23313500 0 4.8gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _aj 379 22650054 0 4.7gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _cc 444 23130046 0 4.8gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _f1 541 23544447 0 4.8gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _f2 542 23361953 0 4.8gb 0 true true 9.12.1 false
| Format / Version | 2.19 (9.12) | 3.0 (10.1) | Difference (3.0 - 2.19) | % Change |
|---|---|---|---|---|
| Stored Fields | 297,278 ms | 280,421 ms | -16,857 ms | -5.67% |
| Norms | 6,497 ms | 6,388 ms | -109 ms | -1.68% |
| Postings | 447,418 ms | 575,100 ms | +127,682 ms | 28.54% |
| Doc Values | 223,610 ms | 219,824 ms | -3,786 ms | -1.69% |
| Points | 72,534 ms | 75,501 ms | +2,967 ms | 4.09% |
| Field Infos | 0 ms | 0 ms | 0 ms | 0% |
When segments are of same size but some segments are part of compound file, we notice a big drop in time taken for stored fields
3.0
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
big5 0 p 127.0.0.1 _b 11 21474616 0 4.4gb 0 true true 10.1.0 false
big5 0 p 127.0.0.1 _h 17 21556006 0 4.4gb 0 true true 10.1.0 false
big5 0 p 127.0.0.1 _p 25 21447620 0 4.4gb 0 true true 10.1.0 false
big5 0 p 127.0.0.1 _x 33 21544817 0 4.4gb 0 true true 10.1.0 false
big5 0 p 127.0.0.1 _12 38 3582255 0 754.9mb 0 true true 10.1.0 true
big5 0 p 127.0.0.1 _13 39 3579532 0 754.1mb 0 true true 10.1.0 true
big5 0 p 127.0.0.1 _14 40 1071872 0 228.9mb 0 true true 10.1.0 true
big5 0 p 127.0.0.1 _15 41 21743282 0 4.5gb 0 true true 10.1.0 false
2.19
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound
big5 0 p 127.0.0.1 _b 11 21474616 0 4.4gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _h 17 21427535 0 4.4gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _p 25 21549736 0 4.4gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _y 34 21568569 0 4.4gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _15 41 3581274 0 754.6mb 0 true true 9.12.1 true
big5 0 p 127.0.0.1 _16 42 3578536 0 753.9mb 0 true true 9.12.1 true
big5 0 p 127.0.0.1 _17 43 21889332 0 4.5gb 0 true true 9.12.1 false
big5 0 p 127.0.0.1 _18 44 930402 0 199.1mb
| Format / Version | 2.19 (9.12) | 3.0 (10.1) | Difference (3.0 - 2.19) | % Change |
|---|---|---|---|---|
| Stored Fields | 282,483 ms | 402,009 ms | +119,526 ms | 42.31% |
| Norms | 7,993 ms | 6,186 ms | -1,807 ms | -22.61% |
| Postings | 474,146 ms | 571,138 ms | +96,992 ms | 20.46% |
| Doc Values | 206,165 ms | 203,756 ms | -2,409 ms | -1.17% |
| Points | 87,173 ms | 78,683 ms | -8,490 ms | -9.74% |
| Field Infos | 0 ms | 0 ms | 0 ms | 0% |
| Total Runtime | 1,058.0s | 1,261.8s | +203.8s | 19.26% |
postings format observation
We see consistent ~20% regression for postings format. And this is a hot path as we had lots of changes in Lucene 10.
stored fields observation
But surprisingly stored fields also show regression with not many direct changes to the format itself and looks like the compound files merges have slowed down the stored fields based on the following data. Note how the segments where compound = true, stored fields is taking 5x - 10x more time.
3.0
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound StoredFieldsTime
big5 0 p 127.0.0.1 _3r 135 22939930 0 4.7gb 0 true true 10.1.0 false 21984
big5 0 p 127.0.0.1 _54 184 14294416 0 2.9gb 0 true true 10.1.0 false 13314
big5 0 p 127.0.0.1 _8k 308 23703067 0 4.9gb 0 true true 10.1.0 false 22745
big5 0 p 127.0.0.1 _af 375 18639586 0 3.8gb 0 true true 10.1.0 false 40955 <-outlier
big5 0 p 127.0.0.1 _dq 494 2432310 0 513.1mb 0 true true 10.1.0 true 43307
big5 0 p 127.0.0.1 _dv 499 23959061 0 4.9gb 0 true true 10.1.0 false 22997
big5 0 p 127.0.0.1 _e1 505 2301159 0 485.6mb 0 true true 10.1.0 true 40955
big5 0 p 127.0.0.1 _ec 516 2714350 0 572.6mb 0 true true 10.1.0 true 48479
big5 0 p 127.0.0.1 _en 527 2432908 0 514mb 0 true true 10.1.0 true 43387
big5 0 p 127.0.0.1 _ey 538 2583213 0 545mb 0 true true 10.1.0 true 46936
2.19
index shard prirep ip segment generation docs.count docs.deleted size size.memory committed searchable version compound StoredFieldsTime
big5 0 p 127.0.0.1 _3r 135 23357403 0 4.8gb 0 true true 9.12.1 false 23406
big5 0 p 127.0.0.1 _aj 379 23018556 0 4.7gb 0 true true 9.12.1 false 23035
big5 0 p 127.0.0.1 _ca 442 23341816 0 4.8gb 0 true true 9.12.1 false 23382
big5 0 p 127.0.0.1 _cp 457 4417375 0 929.9mb 0 true true 9.12.1 true 4419
big5 0 p 127.0.0.1 _d9 477 4839945 0 1017.1mb 0 true true 9.12.1 true 4842
big5 0 p 127.0.0.1 _dv 499 4268096 0 897.9mb 0 true true 9.12.1 true 4625
big5 0 p 127.0.0.1 _ef 519 4686246 0 986.2mb 0 true true 9.12.1 true 4692
big5 0 p 127.0.0.1 _eq 530 2672224 0 563.9mb 0 true true 9.12.1 true 2671
big5 0 p 127.0.0.1 _f0 540 2143226 0 453.4mb 0 true true 9.12.1 true 2143
big5 0 p 127.0.0.1 _f1 541 23255113 0 4.8gb 0 true true 9.12.1 false 22303
Expected behavior
The force merge times of various file formats must be same for 3.0 OS and 2.19 OS.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status