[FC-0118] docs: add ADR for standardizing pagination across APIs#38300
Conversation
- Proposes DefaultPagination from edx-drf-extensions as platform-wide standard - Documents migration path for LimitOffsetPagination and unpaginated endpoints - Includes code examples for ListAPIView, APIView, and mobile pagination - Outlines rollout plan and alternatives considered
| * **User Accounts API** (``/api/user/v1/accounts/``) — pagination behavior differs from other user-related APIs, making it difficult for consumers to use a single data-loading pattern. | ||
| * **Course Members API** (``/api/courses/v1/.../members/``) — returns all enrollments without pagination, relying on a ``COURSE_MEMBER_API_ENROLLMENT_LIMIT`` setting (default 1000) to cap results and raising ``OverEnrollmentLimitException`` instead of paginating. | ||
| * **Enrollment API** (``/api/enrollment/v1/``) — some list endpoints return full result sets without pagination support. | ||
| * **Course Blocks API** (``/api/courses/v2/blocks/``) — intentionally returns unpaginated data for the entire course structure, which can result in very large response payloads. |
There was a problem hiding this comment.
In general, pagination of tree structures is complicated, to say the least.
Does a "page" size of 10 refer to 10 top-level items, which may potentially have hundreds of children included? Or a "varying shape" response with 1 top-level item + 9 children, or 8 top-level items + 2 children, and even more complexities with grandchildren and great-grandchildren? Or do we limit to returning 1 depth level at a time to avoid this?
Claude suggests the following:
The most principled approach distinguishes between two different questions clients are asking:
"What is the shape of this tree?" — This is a structural query. The answer (IDs, types, parent-child relationships, display names) is typically small and bounded even for large courses. It should be returned in full, without pagination, at controlled depth. A course with 500 blocks has maybe 5–15KB of structure. Trying to paginate this creates more problems than it solves.
"What is the full data for these nodes?" — This is a content query. Node content (student view data, completion state, grade details) can be large per node. This is where you paginate — but over a flat list of node IDs, not the tree itself.
Specifically, for the ADR, that would mean stating something like this:
Tree-shaped endpoints must not apply standard item-count pagination to the full node set. Instead, they must choose one of:
- Return the complete structural representation (IDs, types, relationships) and paginate separately over node content when requested, or
- Return the tree to a fixed maximum depth and provide explicit child-fetch URLs for any subtrees beyond that depth.
CC @jesperhodge re taxonomy pagination.
Note: Claude also said:
The course blocks API is actually a reasonable example of getting this mostly right already —
requested_fieldslets you strip the response down to structural metadata, and you can fetch full block detail separately. Its main gap is that the approach isn't documented as an explicit standard, so other tree-shaped APIs have reinvented things differently. ADR 0036 should probably make this the pattern explicitly.
There was a problem hiding this comment.
Hmm, I guess this is actually explored in #38305 - why not just combine that ADR into this one?
There was a problem hiding this comment.
Agree, this is a real gap. A "page size of 10" just doesn't have a well-defined meaning when items can contain arbitrary subtrees and silently applying DefaultPagination to a tree endpoint would produce weirdly-shaped responses that nobody could consume reliably.
I'm going to add a "Scope and Tree-Shaped Endpoints" section to the ADR that:
- explicitly holds tree endpoints (Course Blocks, Taxonomy, OLX structure, progress trees) out of standard item-count pagination, and
- requires them to either (a) return the full structural representation unpaginated at a controlled depth and paginate content separately, or (b) cap the tree at a max depth and expose child-fetch URLs for deeper subtrees.
Course Blocks already does roughly the right thing with requested_fields, your point that it just hasn't been written down as the platform standard is fair, so I'm calling it out as the reference implementation. Response-shape details (minimal vs full, field selection, flattening) belong in ADR-0036 (#38305) rather than here, so the new section defers to it instead of duplicating.
There was a problem hiding this comment.
I think they're really two different decisions that only overlap at tree endpoints, and I think keeping them separate reads better for future contributors.
#38305 is about how individual resources are shaped, minimal views, ?fields=..., flattening nested JSON. This ADR is about how list responses are enveloped, count / next / previous / results, page / page_size.
For most endpoints only one of the two applies:
- A flat enrollments list needs this ADR and is indifferent to [FC-0118] docs: ADR for normalizing nested json apis #38305.
- A single GET /courses/{id}/ detail response needs [FC-0118] docs: ADR for normalizing nested json apis #38305 and is indifferent to this one.
- Only tree-shaped list endpoints sit in the intersection.
Combining them would force flat-list and single-resource endpoints to reason about concerns that don't apply to them, and it couples two decisions we'd probably want to be able to revise independently later. The tree-endpoint overlap is the real concern, and I'm handling it explicitly in the new scope section, response shape defers to ADR-0036, list enveloping stays here.
@bradenmacdonald Let me know of your thoughts on this approach ?
| Alternatives Considered | ||
| ----------------------- | ||
|
|
||
| * **Standardize on LimitOffsetPagination instead of PageNumberPagination**: Rejected because ``edx-drf-extensions`` already ships ``DefaultPagination`` based on ``PageNumberPagination``, and a significant portion of the platform already uses it. Additionally, ``limit``/``offset`` pagination degrades in performance with large offsets because the database must scan and skip all preceding rows, making it unsuitable for large Open edX datasets such as enrollments and completions. |
There was a problem hiding this comment.
Additionally,
limit/offsetpagination degrades in performance with large offsets because the database must scan and skip all preceding rows, making it unsuitable for large Open edX datasets such as enrollments and completions.
This doesn't make any sense. limit/offset pagination and page number pagination have exactly the same database performance characteristics if implemented naively. But this is just the client-facing API shape; technically, there are ways to implement either page number pagination or limit/offset pagination using a cursor internally to improve performance.
The main reasons to prefer page number pagination are that it's already widely used, and it's much easier for humans to understand than limit/offset.
There was a problem hiding this comment.
You're right, removing this. DRF's PageNumberPagination wraps django.core.paginator.Paginator, which emits the same LIMIT ... OFFSET ... SQL that LimitOffsetPagination does. page=5&page_size=10 and limit=10&offset=40 are identical at the database.
Rewriting the bullet around the reasons that actually hold, which are the ones you pointed out:
- DefaultPagination is already the de facto default in edx-drf-extensions and in use across a lot of existing endpoints, so standardizing on it minimizes migration churn.
- Numbered pages are easier for humans to reason about, bookmark, and share, and they line up with existing MFE numbered-page controls.
Updated version is in the next commit.
| ----------------------- | ||
|
|
||
| * **Standardize on LimitOffsetPagination instead of PageNumberPagination**: Rejected because ``edx-drf-extensions`` already ships ``DefaultPagination`` based on ``PageNumberPagination``, and a significant portion of the platform already uses it. Additionally, ``limit``/``offset`` pagination degrades in performance with large offsets because the database must scan and skip all preceding rows, making it unsuitable for large Open edX datasets such as enrollments and completions. | ||
| * **Adopt CursorPagination as the platform standard**: Rejected because cursor-based pagination, while performant for large and frequently-changing datasets, does not support random page access (jumping to page N). This would break existing MFE patterns that display numbered page controls. Cursor pagination also requires a stable, unique, sequential sort key on every queryset, which not all Open edX models guarantee today. |
There was a problem hiding this comment.
Cursor pagination does not require a sort keys to be sequential nor unique. It just requires that you can define a deterministic ORDER BY on every QuerySet, and that the sort key is indexed (for performance).
While "basic" cursor-based pagination works like WHERE id > :last_seen_id ORDER BY id LIMIT :page_size, you could instead use WHERE (sort_key, id) > (last_value, last_id) ORDER BY sort_key, id to make cursor-based pagination work for any comparable, indexed type — timestamps, strings, UUIDs, whatever.
There was a problem hiding this comment.
thanks for pointing thus out, cursor pagination only needs a deterministic, indexed ORDER BY, not a unique or sequential key. Compound orderings like ORDER BY sort_key, id handle the non-unique case, and DRF's own CursorPagination already supports "nearly unique" fields via the position-plus-offset approach its docs describe.
Rewriting around the reasons that actually apply:
- It doesn't support random page access (jumping to page N), which breaks existing MFE numbered-page controls and bookmarkable deep links.
- The response envelope (opaque next / previous cursors, no count) differs enough from what existing Open edX consumers expect that adoption would need coordinated client-side changes rather than a gradual per-endpoint rollout.
Currently, Open edX REST APIs implement pagination inconsistently across endpoints — some use page/page_size, others use limit/offset, and several return full unbounded result sets entirely. This forces every API consumer, whether an MFE, mobile client, or AI agent, to implement custom data-loading logic per endpoint, and risks overloading clients with large unpaginated payloads. This ADR proposes standardizing all list-type endpoints on the existing DefaultPagination class from edx-drf-extensions, enforcing a consistent response envelope across the platform and enabling consumers to implement a single reusable pagination loop for all Open edX APIs.
Issue: http://github.com/openedx/openedx-platform/issues/38266