🐛 fix(search-api): stable result order and configurable per-file hit limit by gaborbernat · Pull Request #4935 · oracle/opengrok

gaborbernat · 2026-04-10T20:19:53Z

Two bugs in /api/v1/search produce results that disagree with the HTML search page for identical queries, making the API unsuitable as a drop-in replacement for clients migrating off HTML scraping.

Search results change between calls with identical parameters

Running the same search twice with maxresults=20 can return different files each time, even when the index has not changed. A pipeline paging through results risks missing files or processing duplicates. This happens because the API returns files in an arbitrary order that varies between JVM runs, so trimming at maxresults picks a different subset on each call. After this fix, files always appear in the order determined by the sort parameter, making repeated calls with the same parameters return the same results.

Searches return fewer matching lines per file than expected

When a symbol is referenced many times in a file, the API returns only some of those references. A symbol used 25 times in a file shows up as 10 hits via the API but all 25 via the HTML page, with no indication in the response that lines were dropped. This happens because an internal server-side limit kicks in whenever a query matches more than 100 documents in total, silently capping the per-file hit count. After this fix, the default behavior returns all matching lines per file, consistent with the HTML page.

Performance implications. Removing the per-file cap can significantly increase response size for broad queries over large codebases — a query matching thousands of files, each with dozens of hits, will now return all of them. The new maxhitsperfile parameter (default 0 = unlimited) lets callers trade completeness for bounded response size: maxhitsperfile=10 caps each file at 10 hits regardless of total match count. The existing maxresults bounds the number of files returned; maxhitsperfile bounds the lines per file. Using both gives full control over response size with predictable upper bounds.

gaborbernat · 2026-04-13T16:14:08Z

@vladak this should be ready for review now, thanks!

vladak · 2026-04-16T08:48:40Z

Is this possibly related to #3239 ?

vladak

looks good now

…limit Two bugs caused /api/v1/search to return different results than the HTML search page for the same query, breaking clients migrating from HTML scraping to the REST API. Also adds maxhitsperfile parameter for callers that need to bound response size. Collectors.groupingBy() uses HashMap so the file order in the JSON results object is non-deterministic. With any maxresults cap, different orderings produce different result sets across calls. Switching to LinkedHashMap preserves the Lucene scoring order consistent with the sort parameter. SearchEngine applied limit = nhits > 100 to cap matching lines per file when a query matches many documents — a cap the HTML page never applied. For a heavily-referenced symbol this meant the API silently dropped most matching lines per file. The fix replaces the boolean cap with a maxhitsperfile query parameter (default 0 = unlimited, matching the HTML page). Callers that need to bound per-file hits can pass a positive value. The apiary documents both the result ordering guarantee and the new maxhitsperfile parameter.

Move hit-per-file limiting from post-hoc trimming in SearchEngine into Context.getContext() via a new maxHits parameter. Fix apiary Note line that was parsed as a parameter. Use QueryParameters constants in tests, add URL null check, extract variables, improve test names and comments.

The note text between Parameters and Response sections was parsed as an unrecognized block by drafter. Action descriptions must precede the Parameters section in API Blueprint format.

vladak · 2026-04-22T17:54:23Z

Thanks !

oracle-contributor-agreement Bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Apr 10, 2026

gaborbernat changed the title ~~fix(api): preserve hit order and remove per-file context limit in search API~~ 🐛 fix(search-api): stable result order and complete per-file hits Apr 10, 2026

gaborbernat force-pushed the fix/api-search-line-numbers branch 2 times, most recently from 7640a91 to bc9271a Compare April 13, 2026 15:01

gaborbernat changed the title ~~🐛 fix(search-api): stable result order and complete per-file hits~~ 🐛 fix(search-api): stable result order and configurable per-file hit limit Apr 13, 2026

gaborbernat force-pushed the fix/api-search-line-numbers branch from bc9271a to f566d92 Compare April 13, 2026 15:10