Skip to content

Add unit learning objectives report API endpoint#14422

Merged
rtibbles merged 25 commits intolearningequality:release-v0.19.xfrom
AllanOXDi:learning_objectives
Mar 23, 2026
Merged

Add unit learning objectives report API endpoint#14422
rtibbles merged 25 commits intolearningequality:release-v0.19.xfrom
AllanOXDi:learning_objectives

Conversation

@AllanOXDi
Copy link
Member

@AllanOXDi AllanOXDi commented Mar 19, 2026

Summary

This PR adds a new endpoint that returns aggregatedpre/posttest scores for every learner assigned to a unit, broken down by learning objective. Learner test attempts are located via UUID5 synthetic content IDs (derived from learner, course session, and unit IDs) that encode the A/B item-set split.

The endpoint is restricted to coach/admin roles and is accompanied by a comprehensive test suite covering permissions, response shape, scoring logic, and learner-group filtering.
closes #14165

References

#14165

Reviewer guidance

  1. Confirm that GET /api/coach/coursesession/<id>/unit/<id>/report/ returns 403 for an unauthenticated request and 200 for the coach.
  2. Verify the response contains per-learner, per-LO correct-answer counts under both "pre_test" and "post_test" keys.
  3. Confirm a learner who has not attempted either test is present in learners but absent from the score maps (not present as a null entry).
  4. Confirm the pre_test. status / post_test.status fields reflect the closed state of the corresponding UnitTestAssignment records

AI usage

Gemini and Cloud AI helped in refining my thinking and testing my hypothesis

@github-actions github-actions bot added DEV: backend Python, databases, networking, filesystem... APP: Coach Re: Coach App (lessons, quizzes, groups, reports, etc.) SIZE: very large labels Mar 19, 2026
@AllanOXDi AllanOXDi force-pushed the learning_objectives branch 3 times, most recently from 879c740 to 96ca797 Compare March 19, 2026 16:27
@AllanOXDi AllanOXDi force-pushed the learning_objectives branch from 96ca797 to 1b17485 Compare March 19, 2026 16:30
@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

@AllanOXDi AllanOXDi requested a review from rtibblesbot March 19, 2026 23:20
Copy link
Contributor

@rtibblesbot rtibblesbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New read-only API endpoint for unit pre/post test report with per-learner, per-LO scoring.

CI passing (only Handle pull request events check present). No UI files — visual review skipped.

  • blocking: PR description is empty — summary, references, reviewer guidance, and AI usage sections are all unfilled placeholders. No linked issues either, so there are no acceptance criteria to review against. Please fill in the description so reviewers understand the intent and can verify completeness.
  • blocking: PR targets release-v0.19.x but the repo default branch is develop. If this feature is meant for the release branch, please confirm. Otherwise, retarget to develop.
  • suggestion: unit_contentnode_id is not validated as an actual unit — any ContentNode pk would be accepted, returning an empty-ish response with a misleading unit_title. See inline comment.
  • praise: Thorough test suite — permissions, response shape, scoring logic, edge cases (null options, partial credit, retakes, duplicate attempts, learner groups, dual-role exclusion). See inline comment.
  • praise: SQLite chunking with _IN_CHUNK_SIZE and clear rationale. See inline comment.

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Reviewed the pull request diff checking for:

  • Correctness: bugs, edge cases, undocumented behavior, resource leaks, hardcoded values
  • Design: unnecessary complexity, naming, readability, comment accuracy, redundant state
  • Architecture: duplicated concerns, minimal interfaces, composition over inheritance
  • Testing: behavior-based assertions, mocks only at hard boundaries, accurate coverage
  • Completeness: missing dependencies, unupdated usages, i18n, accessibility, security
  • Principles: DRY (same reason to change), SRP, Rule of Three (no premature abstraction)
  • Checked CI status and linked issue acceptance criteria
  • For UI changes: inspected screenshots for layout, visual completeness, and consistency

course_session = getattr(self, "_course_session", None) or get_object_or_404(
CourseSession, pk=course_session_id
)
unit = get_object_or_404(ContentNode, pk=unit_contentnode_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: No validation that unit_contentnode_id refers to a content node with modality == UNIT. Passing any arbitrary ContentNode pk would return a 200 with empty learning objectives and a potentially confusing unit_title. Consider adding a filter like get_object_or_404(ContentNode, pk=unit_contentnode_id, modality=modalities.UNIT) to return 404 for non-unit nodes — or accept this as intentional if the frontend is the only caller and always passes valid unit IDs.

# Conservative upper bound for SQL IN-clause list lengths. SQLite's default
# SQLITE_MAX_VARIABLE_NUMBER is 999; staying under 900 leaves headroom for
# other bind parameters in the same query.
_IN_CHUNK_SIZE = 900
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Good defensive coding for SQLite's variable limit. The constant, chunk size rationale, and the generic _chunked helper are all clear and well-documented.

self.assertIn(lid, result)
self.assertEqual(result[lid], {})


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Excellent test coverage. The test_only_most_recent_complete_mastery_log_used test carefully constructs two mastery logs sharing one ContentSummaryLog (respecting the morango UNIQUE constraint) with opposite correctness outcomes — this is a high-value edge case that's easy to get wrong. The overall suite covers pure functions, API integration, permissions, response shape, scoring, learner groups, and dual-role exclusion.

course_session = CourseSession.objects.get(pk=course_session_id)
if request.user.has_role_for(allowed_roles, course_session.collection):
# Cache so the view can reuse it without a second DB query.
view._course_session = course_session
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: The _course_session side-effect cache on the view instance is pragmatic and documented, but couples the permission class to the view's internal expectations. A small comment on the view side (near line 289) noting this dependency would help — the getattr fallback is good, but a future refactor that removes UnitReportPermissions would silently add an extra query without any visible indication that the optimization was lost.

@AllanOXDi AllanOXDi requested review from akolson and rtibbles March 19, 2026 23:41
@AllanOXDi AllanOXDi marked this pull request as ready for review March 19, 2026 23:41
course_session = CourseSession.objects.get(pk=course_session_id)
if request.user.has_role_for(allowed_roles, course_session.collection):
# Cache so the view can reuse it without a second DB query.
view._course_session = course_session
Copy link
Member Author

@AllanOXDi AllanOXDi Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtibbles we are caching _course_session on the view to avoid a second DB query in retrieve. This creates coupling between the permission class and the viewset. would this be OK! or should we either (a) accept the second query and remove the side effect, or (b) introduce a shared helper (e.g. get_course_session_403) that both call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's just skip this for now - it's an optimization. We can focus on getting the core logic correct and then optimize.

Copy link
Contributor

@rtibblesbot rtibblesbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta re-review of the unit learning objectives report API.

Prior findings

Resolved:

  • PR description empty (blocking) — now filled in with summary, references, reviewer guidance, AI usage, and closes #14165
  • PR targets release-v0.19.x (blocking) — confirmed intentional; multiple merged PRs target this branch as part of the v0.19.x release workflow
  • unit_contentnode_id not validated as a unit (suggestion) — modality=modalities.UNIT added to get_object_or_404
  • _course_session cache comment (suggestion) — expanded comment at lines 288-290 documents the dependency and fallback rationale

4/4 prior findings resolved.

CI passing (only Handle pull request events check). No UI files — visual review skipped. All acceptance criteria from #14165 verified.


@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Compared the current PR state against findings from a prior review:

  • Retrieved prior bot reviews deterministically via the GitHub API
  • Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
  • Only raised NEW findings for newly introduced code
  • Reviewed the pull request diff checking for correctness, design, architecture, testing, completeness, and adherence to DRY/SRP principles
  • Checked CI status and linked issue acceptance criteria

)
unit = get_object_or_404(
ContentNode, pk=unit_contentnode_id, modality=modalities.UNIT
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Clean fix for the prior suggestion — modality=modalities.UNIT on the get_object_or_404 call returns a proper 404 for non-unit content nodes, and the expanded comment above (lines 288-290) documents the _course_session cache dependency for future maintainers.

@AllanOXDi AllanOXDi changed the title Learning objectives Add unit learning objectives report API endpoint Mar 20, 2026
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an error in the spec that we need to rectify, but even if it wasn't there, that wouldn't have helped because of the parallel (and divergent) implementation of content_id generation for the tests.

I would have liked to see either more reuse of existing code, and/or more copying of existing patterns for the queries.

):
"""
Generate a deterministic UUID5 content_id for MasteryLog/AttemptLog tracking.
Must match the generation logic used on the learner side (issue #14133).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says it must match the generation logic used on the learner side - it absolutely does not.

https://github.com/learningequality/kolibri/blob/release-v0.19.x/kolibri/core/logger/api.py#L393

We should be using exactly the same id here.

However, I suspect we have made a mistake there - there we are including the user_id in the synthetic content_id, but that makes the querying much more complicated, what would be more helpful is to user the same synthetic content_id per user.

We should update the logger api.py synthetic content_id generation logic to not include the user_id (so separating it slightly from the deterministic hash used for A and B types), and make it reusable here, rather than implementing a completely parallel id generation strategy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will just extract in to pre_post_test.py without user_id in the key. so that both logger and unit report api can now import from there

TEST_STATUS_CLOSED = "closed"


def get_test_version(learner_id, course_session_id, unit_contentnode_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be reimplementing this - we should move the api.py logic in the logger app to a place where it is reusable here.

Returns (mastery_log_ids, mastery_log_to_meta) where mastery_log_to_meta
maps mastery log id → {learner_id, test_type}.
"""
# Correlated subquery: picks the latest completed MasteryLog per
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not entirely sure what, but this feels more complicated than it needs to be. We have a lot of the logic for querying these kinds of values in bulk in the coach_summary API endpoint, so I think we could look there for examples of how to simplify somewhat.

course_session = CourseSession.objects.get(pk=course_session_id)
if request.user.has_role_for(allowed_roles, course_session.collection):
# Cache so the view can reuse it without a second DB query.
view._course_session = course_session
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's just skip this for now - it's an optimization. We can focus on getting the core logic correct and then optimize.

course_session=course_session
).values_list("collection_id", flat=True)
)
# Exclude users who hold a coach or admin role in the assigned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was never asked for in the issue, and adds unnecessary complexity here.

)
learner_ids = [lr["id"] for lr in learners]

# Determine test state from UnitTestAssignment records.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there may be some existing logic in the course app that we could potentially reuse here.

learner_ids = [lr["id"] for lr in learners]

# Determine test state from UnitTestAssignment records.
all_assignments = list(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure there's any need to coerce these to a list and manipulate them in Python?

The logic below could be handled with some fairly quick exists checks, or at least a minimal fetch if you're worried about doing too many queries - but here you're loading entire models for every assignment and checking them, so that has its own performance implications.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point on removing the list coercion. I've updated get_test_status to accept the queryset directly. But my thought was that the exists() path fires up to 3 queries per test type to check for any assignments, then filters if the first returns True, whereas fetching .values("test_type", "closed"). Since there can only ever be one pre- and one post assignment per course_session, unit, loading them is essentially free. I could also go this way to avoids full model instantiation

def get_test_status(assignments_qs):
    rows = list(assignments_qs.values("closed"))
    if not rows:
        return TEST_STATUS_NOT_ACTIVATED
    if any(not r["closed"] for r in rows):
        return TEST_STATUS_OPEN
    return TEST_STATUS_CLOSED

In case it's relevant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Django Silk page at /profile/ on your devserver could help show what is actually faster here - as we're probably not talking a lot of assignments at once ever, it may well be that doing the values query once is faster.

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more questions, but I think this is close to merge.

"""
import uuid

_SYNTHETIC_CONTENT_ID_NAMESPACE = uuid.UUID("7c9e4b1a-3d5f-4a8e-9c2b-6d0e1f2a3b4c")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this something more specific, as we're using it for _PRE_POST_TEST_SYNTHETIC_CONTENT_ID_NAMESPACE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!_SYNTHETIC_CONTENT_ID_NAMESPACEis ambiguous.

action makes this a read-only endpoint in practice.
"""

permission_classes = (UnitReportPermissions,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just be able to use the KolibriAuthPermissions class here - did that not work?

Copy link
Member Author

@AllanOXDi AllanOXDi Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CourseSession has RoleBasedPermissions(can_be_read_by=(ADMIN, COACH)) so KolibriAuthPermissions should work, has_object_permissiondelegates to request.user.can_read(obj) which will use those. So since UnitReportViewSet is a plain ViewSet, we'd need to call self.check_object_permissions(request, course_session) explicitly inretrieve()after fetching the session. But that would also change the nonexistent session response from 403 to 404? I am happy to make the change if you're okay with 404 there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So since UnitReportViewSet is a plain ViewSet, we'd need to call self.check_object_permissions(request, course_session) explicitly inretrieve()after fetching the session.

We would? Why? Do we do that in any other ViewSet?

But that would also change the nonexistent session response from 403 to 404?

This is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, looking atClassSummaryViewSet, the existing pattern is a custom permission class that does the full check in has_permission that returns True for all GET. ama update it shortly!

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge - one minor thing to consider is whether the independent tests of the compute_test_scores give any additional useful coverage over covering the same scenarios in API tests - it's only used in that context, so independently testing it may not be necessary.

@rtibbles rtibbles merged commit 30bc47b into learningequality:release-v0.19.x Mar 23, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

APP: Coach Re: Coach App (lessons, quizzes, groups, reports, etc.) DEV: backend Python, databases, networking, filesystem... SIZE: very large

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants