Add unit learning objectives report API endpoint#14422
Add unit learning objectives report API endpoint#14422rtibbles merged 25 commits intolearningequality:release-v0.19.xfrom
Conversation
879c740 to
96ca797
Compare
96ca797 to
1b17485
Compare
Build Artifacts
Smoke test screenshot |
rtibblesbot
left a comment
There was a problem hiding this comment.
New read-only API endpoint for unit pre/post test report with per-learner, per-LO scoring.
CI passing (only Handle pull request events check present). No UI files — visual review skipped.
- blocking: PR description is empty — summary, references, reviewer guidance, and AI usage sections are all unfilled placeholders. No linked issues either, so there are no acceptance criteria to review against. Please fill in the description so reviewers understand the intent and can verify completeness.
- blocking: PR targets
release-v0.19.xbut the repo default branch isdevelop. If this feature is meant for the release branch, please confirm. Otherwise, retarget todevelop. - suggestion:
unit_contentnode_idis not validated as an actual unit — any ContentNode pk would be accepted, returning an empty-ish response with a misleadingunit_title. See inline comment. - praise: Thorough test suite — permissions, response shape, scoring logic, edge cases (null options, partial credit, retakes, duplicate attempts, learner groups, dual-role exclusion). See inline comment.
- praise: SQLite chunking with
_IN_CHUNK_SIZEand clear rationale. See inline comment.
@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly
How was this generated?
Reviewed the pull request diff checking for:
- Correctness: bugs, edge cases, undocumented behavior, resource leaks, hardcoded values
- Design: unnecessary complexity, naming, readability, comment accuracy, redundant state
- Architecture: duplicated concerns, minimal interfaces, composition over inheritance
- Testing: behavior-based assertions, mocks only at hard boundaries, accurate coverage
- Completeness: missing dependencies, unupdated usages, i18n, accessibility, security
- Principles: DRY (same reason to change), SRP, Rule of Three (no premature abstraction)
- Checked CI status and linked issue acceptance criteria
- For UI changes: inspected screenshots for layout, visual completeness, and consistency
| course_session = getattr(self, "_course_session", None) or get_object_or_404( | ||
| CourseSession, pk=course_session_id | ||
| ) | ||
| unit = get_object_or_404(ContentNode, pk=unit_contentnode_id) |
There was a problem hiding this comment.
suggestion: No validation that unit_contentnode_id refers to a content node with modality == UNIT. Passing any arbitrary ContentNode pk would return a 200 with empty learning objectives and a potentially confusing unit_title. Consider adding a filter like get_object_or_404(ContentNode, pk=unit_contentnode_id, modality=modalities.UNIT) to return 404 for non-unit nodes — or accept this as intentional if the frontend is the only caller and always passes valid unit IDs.
| # Conservative upper bound for SQL IN-clause list lengths. SQLite's default | ||
| # SQLITE_MAX_VARIABLE_NUMBER is 999; staying under 900 leaves headroom for | ||
| # other bind parameters in the same query. | ||
| _IN_CHUNK_SIZE = 900 |
There was a problem hiding this comment.
praise: Good defensive coding for SQLite's variable limit. The constant, chunk size rationale, and the generic _chunked helper are all clear and well-documented.
| self.assertIn(lid, result) | ||
| self.assertEqual(result[lid], {}) | ||
|
|
||
|
|
There was a problem hiding this comment.
praise: Excellent test coverage. The test_only_most_recent_complete_mastery_log_used test carefully constructs two mastery logs sharing one ContentSummaryLog (respecting the morango UNIQUE constraint) with opposite correctness outcomes — this is a high-value edge case that's easy to get wrong. The overall suite covers pure functions, API integration, permissions, response shape, scoring, learner groups, and dual-role exclusion.
| course_session = CourseSession.objects.get(pk=course_session_id) | ||
| if request.user.has_role_for(allowed_roles, course_session.collection): | ||
| # Cache so the view can reuse it without a second DB query. | ||
| view._course_session = course_session |
There was a problem hiding this comment.
suggestion: The _course_session side-effect cache on the view instance is pragmatic and documented, but couples the permission class to the view's internal expectations. A small comment on the view side (near line 289) noting this dependency would help — the getattr fallback is good, but a future refactor that removes UnitReportPermissions would silently add an extra query without any visible indication that the optimization was lost.
| course_session = CourseSession.objects.get(pk=course_session_id) | ||
| if request.user.has_role_for(allowed_roles, course_session.collection): | ||
| # Cache so the view can reuse it without a second DB query. | ||
| view._course_session = course_session |
There was a problem hiding this comment.
@rtibbles we are caching _course_session on the view to avoid a second DB query in retrieve. This creates coupling between the permission class and the viewset. would this be OK! or should we either (a) accept the second query and remove the side effect, or (b) introduce a shared helper (e.g. get_course_session_403) that both call?
There was a problem hiding this comment.
I think let's just skip this for now - it's an optimization. We can focus on getting the core logic correct and then optimize.
rtibblesbot
left a comment
There was a problem hiding this comment.
Delta re-review of the unit learning objectives report API.
Prior findings
Resolved:
- PR description empty (blocking) — now filled in with summary, references, reviewer guidance, AI usage, and closes #14165
- PR targets
release-v0.19.x(blocking) — confirmed intentional; multiple merged PRs target this branch as part of the v0.19.x release workflow unit_contentnode_idnot validated as a unit (suggestion) —modality=modalities.UNITadded toget_object_or_404_course_sessioncache comment (suggestion) — expanded comment at lines 288-290 documents the dependency and fallback rationale
4/4 prior findings resolved.
CI passing (only Handle pull request events check). No UI files — visual review skipped. All acceptance criteria from #14165 verified.
@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly
How was this generated?
Compared the current PR state against findings from a prior review:
- Retrieved prior bot reviews deterministically via the GitHub API
- Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
- Only raised NEW findings for newly introduced code
- Reviewed the pull request diff checking for correctness, design, architecture, testing, completeness, and adherence to DRY/SRP principles
- Checked CI status and linked issue acceptance criteria
| ) | ||
| unit = get_object_or_404( | ||
| ContentNode, pk=unit_contentnode_id, modality=modalities.UNIT | ||
| ) |
There was a problem hiding this comment.
praise: Clean fix for the prior suggestion — modality=modalities.UNIT on the get_object_or_404 call returns a proper 404 for non-unit content nodes, and the expanded comment above (lines 288-290) documents the _course_session cache dependency for future maintainers.
rtibbles
left a comment
There was a problem hiding this comment.
There's an error in the spec that we need to rectify, but even if it wasn't there, that wouldn't have helped because of the parallel (and divergent) implementation of content_id generation for the tests.
I would have liked to see either more reuse of existing code, and/or more copying of existing patterns for the queries.
| ): | ||
| """ | ||
| Generate a deterministic UUID5 content_id for MasteryLog/AttemptLog tracking. | ||
| Must match the generation logic used on the learner side (issue #14133). |
There was a problem hiding this comment.
This says it must match the generation logic used on the learner side - it absolutely does not.
We should be using exactly the same id here.
However, I suspect we have made a mistake there - there we are including the user_id in the synthetic content_id, but that makes the querying much more complicated, what would be more helpful is to user the same synthetic content_id per user.
We should update the logger api.py synthetic content_id generation logic to not include the user_id (so separating it slightly from the deterministic hash used for A and B types), and make it reusable here, rather than implementing a completely parallel id generation strategy.
There was a problem hiding this comment.
Okay, I will just extract in to pre_post_test.py without user_id in the key. so that both logger and unit report api can now import from there
| TEST_STATUS_CLOSED = "closed" | ||
|
|
||
|
|
||
| def get_test_version(learner_id, course_session_id, unit_contentnode_id): |
There was a problem hiding this comment.
We shouldn't be reimplementing this - we should move the api.py logic in the logger app to a place where it is reusable here.
| Returns (mastery_log_ids, mastery_log_to_meta) where mastery_log_to_meta | ||
| maps mastery log id → {learner_id, test_type}. | ||
| """ | ||
| # Correlated subquery: picks the latest completed MasteryLog per |
There was a problem hiding this comment.
I am not entirely sure what, but this feels more complicated than it needs to be. We have a lot of the logic for querying these kinds of values in bulk in the coach_summary API endpoint, so I think we could look there for examples of how to simplify somewhat.
| course_session = CourseSession.objects.get(pk=course_session_id) | ||
| if request.user.has_role_for(allowed_roles, course_session.collection): | ||
| # Cache so the view can reuse it without a second DB query. | ||
| view._course_session = course_session |
There was a problem hiding this comment.
I think let's just skip this for now - it's an optimization. We can focus on getting the core logic correct and then optimize.
| course_session=course_session | ||
| ).values_list("collection_id", flat=True) | ||
| ) | ||
| # Exclude users who hold a coach or admin role in the assigned |
There was a problem hiding this comment.
This was never asked for in the issue, and adds unnecessary complexity here.
| ) | ||
| learner_ids = [lr["id"] for lr in learners] | ||
|
|
||
| # Determine test state from UnitTestAssignment records. |
There was a problem hiding this comment.
I think there may be some existing logic in the course app that we could potentially reuse here.
| learner_ids = [lr["id"] for lr in learners] | ||
|
|
||
| # Determine test state from UnitTestAssignment records. | ||
| all_assignments = list( |
There was a problem hiding this comment.
Not sure there's any need to coerce these to a list and manipulate them in Python?
The logic below could be handled with some fairly quick exists checks, or at least a minimal fetch if you're worried about doing too many queries - but here you're loading entire models for every assignment and checking them, so that has its own performance implications.
There was a problem hiding this comment.
Fair point on removing the list coercion. I've updated get_test_status to accept the queryset directly. But my thought was that the exists() path fires up to 3 queries per test type to check for any assignments, then filters if the first returns True, whereas fetching .values("test_type", "closed"). Since there can only ever be one pre- and one post assignment per course_session, unit, loading them is essentially free. I could also go this way to avoids full model instantiation
def get_test_status(assignments_qs):
rows = list(assignments_qs.values("closed"))
if not rows:
return TEST_STATUS_NOT_ACTIVATED
if any(not r["closed"] for r in rows):
return TEST_STATUS_OPEN
return TEST_STATUS_CLOSED
In case it's relevant.
There was a problem hiding this comment.
The Django Silk page at /profile/ on your devserver could help show what is actually faster here - as we're probably not talking a lot of assignments at once ever, it may well be that doing the values query once is faster.
rtibbles
left a comment
There was a problem hiding this comment.
A couple more questions, but I think this is close to merge.
| """ | ||
| import uuid | ||
|
|
||
| _SYNTHETIC_CONTENT_ID_NAMESPACE = uuid.UUID("7c9e4b1a-3d5f-4a8e-9c2b-6d0e1f2a3b4c") |
There was a problem hiding this comment.
Let's call this something more specific, as we're using it for _PRE_POST_TEST_SYNTHETIC_CONTENT_ID_NAMESPACE?
There was a problem hiding this comment.
Good point!_SYNTHETIC_CONTENT_ID_NAMESPACEis ambiguous.
| action makes this a read-only endpoint in practice. | ||
| """ | ||
|
|
||
| permission_classes = (UnitReportPermissions,) |
There was a problem hiding this comment.
I think you should just be able to use the KolibriAuthPermissions class here - did that not work?
There was a problem hiding this comment.
CourseSession has RoleBasedPermissions(can_be_read_by=(ADMIN, COACH)) so KolibriAuthPermissions should work, has_object_permissiondelegates to request.user.can_read(obj) which will use those. So since UnitReportViewSet is a plain ViewSet, we'd need to call self.check_object_permissions(request, course_session) explicitly inretrieve()after fetching the session. But that would also change the nonexistent session response from 403 to 404? I am happy to make the change if you're okay with 404 there.
There was a problem hiding this comment.
So since UnitReportViewSet is a plain ViewSet, we'd need to call self.check_object_permissions(request, course_session) explicitly inretrieve()after fetching the session.
We would? Why? Do we do that in any other ViewSet?
But that would also change the nonexistent session response from 403 to 404?
This is fine.
There was a problem hiding this comment.
Okay, looking atClassSummaryViewSet, the existing pattern is a custom permission class that does the full check in has_permission that returns True for all GET. ama update it shortly!
rtibbles
left a comment
There was a problem hiding this comment.
Good to merge - one minor thing to consider is whether the independent tests of the compute_test_scores give any additional useful coverage over covering the same scenarios in API tests - it's only used in that context, so independently testing it may not be necessary.
Summary
This PR adds a new endpoint that returns aggregated
pre/posttest scores for every learner assigned to a unit, broken down by learning objective. Learner test attempts are located via UUID5 synthetic content IDs (derived from learner, course session, and unit IDs) that encode the A/B item-set split.The endpoint is restricted to coach/admin roles and is accompanied by a comprehensive test suite covering permissions, response shape, scoring logic, and learner-group filtering.
closes #14165
References
#14165
Reviewer guidance
GET /api/coach/coursesession/<id>/unit/<id>/report/returns 403 for an unauthenticated request and 200 for the coach."pre_test"and "post_test"keys.pre_test. status / post_test.statusfields reflect the closed state of the correspondingUnitTestAssignmentrecordsAI usage
Gemini and Cloud AI helped in refining my thinking and testing my hypothesis