Skip to content

#1645: add support for "search after" pagination#2298

Open
klimov-paul wants to merge 4 commits intoruflin:8.xfrom
klimov-paul:1645-search-after
Open

#1645: add support for "search after" pagination#2298
klimov-paul wants to merge 4 commits intoruflin:8.xfrom
klimov-paul:1645-search-after

Conversation

@klimov-paul
Copy link
Copy Markdown

@klimov-paul klimov-paul commented Mar 22, 2026

Resolves #1645.

Adds support for "search after" pagination and infinite documents scroll.

See: https://www.elastic.co/guide/en/elasticsearch/reference/8.18/paginate-search-results.html#search-after

Covers "8.x" branch.
Should be merged to "9.x" separately, if PR is accepted.

Summary by CodeRabbit

  • New Features
    • Introduced search-after pagination support for efficient cursor-based navigation through large result sets.
    • Added methods to iterate through results with automatic pagination handling.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

The changes implement search-after based pagination support by adding a setSearchAfter() method to the Query class, a getSort() method to the Document class, and two new generator methods (each() and batch()) to the Index class for iterating through large result sets. Build configuration and documentation updates accompany these additions.

Changes

Cohort / File(s) Summary
Core Search-After API
src/Query.php, src/Document.php
Added setSearchAfter() method to Query for configuring pagination parameters and getSort() method to Document for retrieving sort values from previous results.
Pagination Iterators
src/Index.php
Added two generator methods: each() yields individual documents, batch() yields document arrays, both implementing search-after pagination by repeatedly executing searches with advanced cursor positions.
Functional Tests
tests/QueryTest.php, tests/IndexTest.php
Added test cases validating setSearchAfter() functionality and verifying each() and batch() iterators correctly paginate through result sets in expected order.
Build & Documentation
Makefile, CHANGELOG.md
Updated Makefile to enable symfony/flex plugin and documented search-after pagination support in CHANGELOG under "Backward Compatibility Breaks".

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Index
    participant Query
    participant Elasticsearch
    participant Document
    
    User->>Index: each(query, batchSize)
    activate Index
    Index->>Query: create(query)
    Index->>Query: setSize(batchSize)
    
    loop Until documents returned < batchSize
        Index->>Elasticsearch: search(query)
        Elasticsearch-->>Index: ResultSet with Documents
        activate Document
        Index->>Document: getSort()
        Document-->>Index: sort values array
        deactivate Document
        Index->>User: yield Document
        Index->>Query: setSearchAfter(sortValues)
    end
    deactivate Index
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hops through pages, no endless scroll,
Search-after magic makes pagination whole,
Each document yields in sorted delight,
Batches and iterations—pagination done right!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely identifies the main change: adding 'search after' pagination support with a specific reference to issue #1645.
Linked Issues check ✅ Passed The PR fully implements both objectives from issue #1645: a setter for search_after parameter in Query class and iterator methods (each/batch) for deep pagination.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing search_after pagination: Query setter, Index iterators, Document sort getter, tests, and Composer configuration update.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ruflin
Copy link
Copy Markdown
Owner

ruflin commented Apr 5, 2026

@klimov-paul Sorry for the late review. Happy to get this improvement in but I would prefer that we target this first against 9.x and then backport to 8.x. With this we ensure all features that are available in 8.x are in 9.x.

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

@ruflin, I'll start reviewing the PR now!

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
tests/IndexTest.php (1)

934-995: Add a 3-page regression case.

These fixtures only create two pages (3 docs with batch size 2), so they never exercise a second search_after update. A 5-document case would catch cursor-handling bugs that only show up on the third request.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/IndexTest.php` around lines 934 - 995, The tests testIterateEach and
testIterateBatch only add 3 documents so with page size 2 they produce two pages
and never exercise a third search_after call; update the fixtures used in these
tests (the calls to $index->addDocuments in testIterateEach and
testIterateBatch) to add 5 documents (e.g. ids '1'..'5' with appropriate
country_id/region_id values) so the iteration/batch logic must request three
pages and will exercise the search_after cursor update path; keep the same Query
size (setSize(2)) and sorting (region_id desc) so asserted order expectations
are adjusted accordingly if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Index.php`:
- Around line 553-575: The issue is that Query::create($query) can return the
same Query instance passed by the caller and each() (and batch()) mutates it via
setSize and setSearchAfter; clone the Query before mutating to avoid changing
the caller's object: in each() (and batch()) call Query::create($query) into a
local variable and if the original $query is an instance of Query, replace it
with a cloned copy (e.g., $query = clone $query) before calling setSize() and
setSearchAfter(); then proceed to use that cloned local query with search() so
the caller's Query remains unchanged.
- Around line 553-554: Validate the $batchSize argument at the start of the
each() method and fail fast: if $batchSize is less than or equal to 0 throw an
InvalidArgumentException with a clear message (e.g., "batchSize must be a
positive integer"); do this before entering the loop or binding $document so you
never forward an invalid size or iterate with zero-size batches. Apply the same
check to the other method in this class that accepts a $batchSize parameter (the
similar batched method around the adjacent block) so both paths validate input
consistently.

In `@src/Query.php`:
- Around line 489-495: setSearchAfter currently calls addParam('search_after',
$value) in a loop which appends values across pages and breaks pagination when
Index::each()/batch() call it repeatedly; change setSearchAfter to replace the
entire search_after parameter instead of appending by setting the full tuple at
once (e.g. use a single setParam/setParameter-like call or assign
$this->params['search_after'] = $searchAfter) so the 'search_after' key is
overwritten with the provided array rather than accumulating values.

---

Nitpick comments:
In `@tests/IndexTest.php`:
- Around line 934-995: The tests testIterateEach and testIterateBatch only add 3
documents so with page size 2 they produce two pages and never exercise a third
search_after call; update the fixtures used in these tests (the calls to
$index->addDocuments in testIterateEach and testIterateBatch) to add 5 documents
(e.g. ids '1'..'5' with appropriate country_id/region_id values) so the
iteration/batch logic must request three pages and will exercise the
search_after cursor update path; keep the same Query size (setSize(2)) and
sorting (region_id desc) so asserted order expectations are adjusted accordingly
if necessary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4b11fa5-c756-43db-af01-4eb7949355d6

📥 Commits

Reviewing files that changed from the base of the PR and between cd0a5fa and 369d823.

📒 Files selected for processing (7)
  • CHANGELOG.md
  • Makefile
  • src/Document.php
  • src/Index.php
  • src/Query.php
  • tests/IndexTest.php
  • tests/QueryTest.php

Comment on lines +553 to +575
public function each($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
{
$query = Query::create($query);

if (!$query->hasParam('sort')) {
throw new \LogicException('Query must have "sort" parameter in order to use "search after" based iteration.');
}

$query->setSize($batchSize);

while (true) {
$resultSet = $this->search($query, $options, $method);
foreach ($resultSet->getDocuments() as $document) {
yield $document;
}

if (count($resultSet->getDocuments()) < $batchSize) {
break;
}

$query->setSearchAfter($document->getSort());
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid mutating the caller's Query instance.

Query::create() returns the same object when $query is already a Query, so Lines 561/573 and 596/614 permanently overwrite the caller's size and search_after. Reusing that query after each() or batch() will resume from the last page instead of the original request.

♻️ Proposed fix
-        $query = Query::create($query);
+        $query = clone Query::create($query);
@@
-        $query = Query::create($query);
+        $query = clone Query::create($query);

Also applies to: 588-615

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Index.php` around lines 553 - 575, The issue is that
Query::create($query) can return the same Query instance passed by the caller
and each() (and batch()) mutates it via setSize and setSearchAfter; clone the
Query before mutating to avoid changing the caller's object: in each() (and
batch()) call Query::create($query) into a local variable and if the original
$query is an instance of Query, replace it with a cloned copy (e.g., $query =
clone $query) before calling setSize() and setSearchAfter(); then proceed to use
that cloned local query with search() so the caller's Query remains unchanged.

Comment on lines +553 to +554
public function each($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate $batchSize before entering the loop.

With 0, each() reaches Line 573 without ever binding $document; with negative values, both methods forward an invalid size downstream. This should fail fast.

🛡️ Proposed fix
     public function each($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
     {
+        if ($batchSize < 1) {
+            throw new InvalidException('Batch size must be greater than 0.');
+        }
+
         $query = Query::create($query);
@@
     public function batch($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
     {
+        if ($batchSize < 1) {
+            throw new InvalidException('Batch size must be greater than 0.');
+        }
+
         $query = Query::create($query);

Also applies to: 588-589

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Index.php` around lines 553 - 554, Validate the $batchSize argument at
the start of the each() method and fail fast: if $batchSize is less than or
equal to 0 throw an InvalidArgumentException with a clear message (e.g.,
"batchSize must be a positive integer"); do this before entering the loop or
binding $document so you never forward an invalid size or iterate with zero-size
batches. Apply the same check to the other method in this class that accepts a
$batchSize parameter (the similar batched method around the adjacent block) so
both paths validate input consistently.

Comment on lines +489 to +495
public function setSearchAfter(array $searchAfter): self
{
foreach ($searchAfter as $value) {
$this->addParam('search_after', $value);
}

return $this;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Replace search_after instead of appending it.

Line 492 uses addParam(), so every new page keeps the previous cursor values and appends the new sort tuple. Index::each() / batch() call this repeatedly, which breaks pagination as soon as a result set spans more than two pages.

🐛 Proposed fix
     public function setSearchAfter(array $searchAfter): self
     {
-        foreach ($searchAfter as $value) {
-            $this->addParam('search_after', $value);
-        }
-
-        return $this;
+        return $this->setParam('search_after', \array_values($searchAfter));
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public function setSearchAfter(array $searchAfter): self
{
foreach ($searchAfter as $value) {
$this->addParam('search_after', $value);
}
return $this;
public function setSearchAfter(array $searchAfter): self
{
return $this->setParam('search_after', \array_values($searchAfter));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Query.php` around lines 489 - 495, setSearchAfter currently calls
addParam('search_after', $value) in a loop which appends values across pages and
breaks pagination when Index::each()/batch() call it repeatedly; change
setSearchAfter to replace the entire search_after parameter instead of appending
by setting the full tuple at once (e.g. use a single setParam/setParameter-like
call or assign $this->params['search_after'] = $searchAfter) so the
'search_after' key is overwritten with the provided array rather than
accumulating values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants