#1645: add support for "search after" pagination#2298
#1645: add support for "search after" pagination#2298klimov-paul wants to merge 4 commits intoruflin:8.xfrom
Conversation
📝 WalkthroughWalkthroughThe changes implement search-after based pagination support by adding a Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Index
participant Query
participant Elasticsearch
participant Document
User->>Index: each(query, batchSize)
activate Index
Index->>Query: create(query)
Index->>Query: setSize(batchSize)
loop Until documents returned < batchSize
Index->>Elasticsearch: search(query)
Elasticsearch-->>Index: ResultSet with Documents
activate Document
Index->>Document: getSort()
Document-->>Index: sort values array
deactivate Document
Index->>User: yield Document
Index->>Query: setSearchAfter(sortValues)
end
deactivate Index
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@klimov-paul Sorry for the late review. Happy to get this improvement in but I would prefer that we target this first against 9.x and then backport to 8.x. With this we ensure all features that are available in 8.x are in 9.x. @coderabbitai review |
|
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/IndexTest.php (1)
934-995: Add a 3-page regression case.These fixtures only create two pages (3 docs with batch size 2), so they never exercise a second
search_afterupdate. A 5-document case would catch cursor-handling bugs that only show up on the third request.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/IndexTest.php` around lines 934 - 995, The tests testIterateEach and testIterateBatch only add 3 documents so with page size 2 they produce two pages and never exercise a third search_after call; update the fixtures used in these tests (the calls to $index->addDocuments in testIterateEach and testIterateBatch) to add 5 documents (e.g. ids '1'..'5' with appropriate country_id/region_id values) so the iteration/batch logic must request three pages and will exercise the search_after cursor update path; keep the same Query size (setSize(2)) and sorting (region_id desc) so asserted order expectations are adjusted accordingly if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Index.php`:
- Around line 553-575: The issue is that Query::create($query) can return the
same Query instance passed by the caller and each() (and batch()) mutates it via
setSize and setSearchAfter; clone the Query before mutating to avoid changing
the caller's object: in each() (and batch()) call Query::create($query) into a
local variable and if the original $query is an instance of Query, replace it
with a cloned copy (e.g., $query = clone $query) before calling setSize() and
setSearchAfter(); then proceed to use that cloned local query with search() so
the caller's Query remains unchanged.
- Around line 553-554: Validate the $batchSize argument at the start of the
each() method and fail fast: if $batchSize is less than or equal to 0 throw an
InvalidArgumentException with a clear message (e.g., "batchSize must be a
positive integer"); do this before entering the loop or binding $document so you
never forward an invalid size or iterate with zero-size batches. Apply the same
check to the other method in this class that accepts a $batchSize parameter (the
similar batched method around the adjacent block) so both paths validate input
consistently.
In `@src/Query.php`:
- Around line 489-495: setSearchAfter currently calls addParam('search_after',
$value) in a loop which appends values across pages and breaks pagination when
Index::each()/batch() call it repeatedly; change setSearchAfter to replace the
entire search_after parameter instead of appending by setting the full tuple at
once (e.g. use a single setParam/setParameter-like call or assign
$this->params['search_after'] = $searchAfter) so the 'search_after' key is
overwritten with the provided array rather than accumulating values.
---
Nitpick comments:
In `@tests/IndexTest.php`:
- Around line 934-995: The tests testIterateEach and testIterateBatch only add 3
documents so with page size 2 they produce two pages and never exercise a third
search_after call; update the fixtures used in these tests (the calls to
$index->addDocuments in testIterateEach and testIterateBatch) to add 5 documents
(e.g. ids '1'..'5' with appropriate country_id/region_id values) so the
iteration/batch logic must request three pages and will exercise the
search_after cursor update path; keep the same Query size (setSize(2)) and
sorting (region_id desc) so asserted order expectations are adjusted accordingly
if necessary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a4b11fa5-c756-43db-af01-4eb7949355d6
📒 Files selected for processing (7)
CHANGELOG.mdMakefilesrc/Document.phpsrc/Index.phpsrc/Query.phptests/IndexTest.phptests/QueryTest.php
| public function each($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator | ||
| { | ||
| $query = Query::create($query); | ||
|
|
||
| if (!$query->hasParam('sort')) { | ||
| throw new \LogicException('Query must have "sort" parameter in order to use "search after" based iteration.'); | ||
| } | ||
|
|
||
| $query->setSize($batchSize); | ||
|
|
||
| while (true) { | ||
| $resultSet = $this->search($query, $options, $method); | ||
| foreach ($resultSet->getDocuments() as $document) { | ||
| yield $document; | ||
| } | ||
|
|
||
| if (count($resultSet->getDocuments()) < $batchSize) { | ||
| break; | ||
| } | ||
|
|
||
| $query->setSearchAfter($document->getSort()); | ||
| } | ||
| } |
There was a problem hiding this comment.
Avoid mutating the caller's Query instance.
Query::create() returns the same object when $query is already a Query, so Lines 561/573 and 596/614 permanently overwrite the caller's size and search_after. Reusing that query after each() or batch() will resume from the last page instead of the original request.
♻️ Proposed fix
- $query = Query::create($query);
+ $query = clone Query::create($query);
@@
- $query = Query::create($query);
+ $query = clone Query::create($query);Also applies to: 588-615
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Index.php` around lines 553 - 575, The issue is that
Query::create($query) can return the same Query instance passed by the caller
and each() (and batch()) mutates it via setSize and setSearchAfter; clone the
Query before mutating to avoid changing the caller's object: in each() (and
batch()) call Query::create($query) into a local variable and if the original
$query is an instance of Query, replace it with a cloned copy (e.g., $query =
clone $query) before calling setSize() and setSearchAfter(); then proceed to use
that cloned local query with search() so the caller's Query remains unchanged.
| public function each($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator | ||
| { |
There was a problem hiding this comment.
Validate $batchSize before entering the loop.
With 0, each() reaches Line 573 without ever binding $document; with negative values, both methods forward an invalid size downstream. This should fail fast.
🛡️ Proposed fix
public function each($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
{
+ if ($batchSize < 1) {
+ throw new InvalidException('Batch size must be greater than 0.');
+ }
+
$query = Query::create($query);
@@
public function batch($query = '', int $batchSize = 100, ?array $options = null, string $method = Request::POST): \Generator
{
+ if ($batchSize < 1) {
+ throw new InvalidException('Batch size must be greater than 0.');
+ }
+
$query = Query::create($query);Also applies to: 588-589
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Index.php` around lines 553 - 554, Validate the $batchSize argument at
the start of the each() method and fail fast: if $batchSize is less than or
equal to 0 throw an InvalidArgumentException with a clear message (e.g.,
"batchSize must be a positive integer"); do this before entering the loop or
binding $document so you never forward an invalid size or iterate with zero-size
batches. Apply the same check to the other method in this class that accepts a
$batchSize parameter (the similar batched method around the adjacent block) so
both paths validate input consistently.
| public function setSearchAfter(array $searchAfter): self | ||
| { | ||
| foreach ($searchAfter as $value) { | ||
| $this->addParam('search_after', $value); | ||
| } | ||
|
|
||
| return $this; |
There was a problem hiding this comment.
Replace search_after instead of appending it.
Line 492 uses addParam(), so every new page keeps the previous cursor values and appends the new sort tuple. Index::each() / batch() call this repeatedly, which breaks pagination as soon as a result set spans more than two pages.
🐛 Proposed fix
public function setSearchAfter(array $searchAfter): self
{
- foreach ($searchAfter as $value) {
- $this->addParam('search_after', $value);
- }
-
- return $this;
+ return $this->setParam('search_after', \array_values($searchAfter));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public function setSearchAfter(array $searchAfter): self | |
| { | |
| foreach ($searchAfter as $value) { | |
| $this->addParam('search_after', $value); | |
| } | |
| return $this; | |
| public function setSearchAfter(array $searchAfter): self | |
| { | |
| return $this->setParam('search_after', \array_values($searchAfter)); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Query.php` around lines 489 - 495, setSearchAfter currently calls
addParam('search_after', $value) in a loop which appends values across pages and
breaks pagination when Index::each()/batch() call it repeatedly; change
setSearchAfter to replace the entire search_after parameter instead of appending
by setting the full tuple at once (e.g. use a single setParam/setParameter-like
call or assign $this->params['search_after'] = $searchAfter) so the
'search_after' key is overwritten with the provided array rather than
accumulating values.
Resolves #1645.
Adds support for "search after" pagination and infinite documents scroll.
See: https://www.elastic.co/guide/en/elasticsearch/reference/8.18/paginate-search-results.html#search-after
Covers "8.x" branch.
Should be merged to "9.x" separately, if PR is accepted.
Summary by CodeRabbit