Skip to content

🐛 fix(search-api): stable result order and configurable per-file hit limit#4935

Merged
vladak merged 3 commits intooracle:masterfrom
gaborbernat:fix/api-search-line-numbers
Apr 22, 2026
Merged

🐛 fix(search-api): stable result order and configurable per-file hit limit#4935
vladak merged 3 commits intooracle:masterfrom
gaborbernat:fix/api-search-line-numbers

Conversation

@gaborbernat
Copy link
Copy Markdown
Contributor

@gaborbernat gaborbernat commented Apr 10, 2026

Two bugs in /api/v1/search produce results that disagree with the HTML search page for identical queries, making the API unsuitable as a drop-in replacement for clients migrating off HTML scraping.

Search results change between calls with identical parameters

Running the same search twice with maxresults=20 can return different files each time, even when the index has not changed. A pipeline paging through results risks missing files or processing duplicates. This happens because the API returns files in an arbitrary order that varies between JVM runs, so trimming at maxresults picks a different subset on each call. After this fix, files always appear in the order determined by the sort parameter, making repeated calls with the same parameters return the same results.

Searches return fewer matching lines per file than expected

When a symbol is referenced many times in a file, the API returns only some of those references. A symbol used 25 times in a file shows up as 10 hits via the API but all 25 via the HTML page, with no indication in the response that lines were dropped. This happens because an internal server-side limit kicks in whenever a query matches more than 100 documents in total, silently capping the per-file hit count. After this fix, the default behavior returns all matching lines per file, consistent with the HTML page.

Performance implications. Removing the per-file cap can significantly increase response size for broad queries over large codebases — a query matching thousands of files, each with dozens of hits, will now return all of them. The new maxhitsperfile parameter (default 0 = unlimited) lets callers trade completeness for bounded response size: maxhitsperfile=10 caps each file at 10 hits regardless of total match count. The existing maxresults bounds the number of files returned; maxhitsperfile bounds the lines per file. Using both gives full control over response size with predictable upper bounds.

@oracle-contributor-agreement oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 10, 2026
@gaborbernat gaborbernat changed the title fix(api): preserve hit order and remove per-file context limit in search API 🐛 fix(search-api): stable result order and complete per-file hits Apr 10, 2026
@gaborbernat gaborbernat force-pushed the fix/api-search-line-numbers branch 2 times, most recently from 7640a91 to bc9271a Compare April 13, 2026 15:01
@gaborbernat gaborbernat changed the title 🐛 fix(search-api): stable result order and complete per-file hits 🐛 fix(search-api): stable result order and configurable per-file hit limit Apr 13, 2026
@gaborbernat gaborbernat force-pushed the fix/api-search-line-numbers branch from bc9271a to f566d92 Compare April 13, 2026 15:10
@gaborbernat
Copy link
Copy Markdown
Contributor Author

@vladak this should be ready for review now, thanks!

Comment thread apiary.apib Outdated
@vladak
Copy link
Copy Markdown
Member

vladak commented Apr 16, 2026

Is this possibly related to #3239 ?

Comment thread opengrok-indexer/src/test/java/org/opengrok/indexer/search/SearchEngineTest.java Outdated
Comment thread opengrok-indexer/src/test/java/org/opengrok/indexer/search/SearchEngineTest.java Outdated
Comment thread opengrok-indexer/src/test/java/org/opengrok/indexer/search/SearchEngineTest.java Outdated
Comment thread opengrok-indexer/src/main/java/org/opengrok/indexer/search/SearchEngine.java Outdated
@gaborbernat gaborbernat requested a review from vladak April 19, 2026 02:44
Copy link
Copy Markdown
Member

@vladak vladak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good now

@vladak vladak force-pushed the fix/api-search-line-numbers branch from b5985e9 to c3571c8 Compare April 22, 2026 16:24
…limit

Two bugs caused /api/v1/search to return different results than the HTML
search page for the same query, breaking clients migrating from HTML
scraping to the REST API. Also adds maxhitsperfile parameter for callers
that need to bound response size.

Collectors.groupingBy() uses HashMap so the file order in the JSON results
object is non-deterministic. With any maxresults cap, different orderings
produce different result sets across calls. Switching to LinkedHashMap
preserves the Lucene scoring order consistent with the sort parameter.

SearchEngine applied limit = nhits > 100 to cap matching lines per file
when a query matches many documents — a cap the HTML page never applied.
For a heavily-referenced symbol this meant the API silently dropped most
matching lines per file. The fix replaces the boolean cap with a
maxhitsperfile query parameter (default 0 = unlimited, matching the HTML
page). Callers that need to bound per-file hits can pass a positive value.

The apiary documents both the result ordering guarantee and the new
maxhitsperfile parameter.
Move hit-per-file limiting from post-hoc trimming in SearchEngine
into Context.getContext() via a new maxHits parameter. Fix apiary
Note line that was parsed as a parameter. Use QueryParameters
constants in tests, add URL null check, extract variables, improve
test names and comments.
The note text between Parameters and Response sections was parsed
as an unrecognized block by drafter. Action descriptions must
precede the Parameters section in API Blueprint format.
@vladak vladak force-pushed the fix/api-search-line-numbers branch from c3571c8 to ee83d2f Compare April 22, 2026 16:45
@vladak vladak merged commit fec1022 into oracle:master Apr 22, 2026
11 checks passed
@vladak
Copy link
Copy Markdown
Member

vladak commented Apr 22, 2026

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants