Skip to content

Optimize Role and Membership bulk creation with bulk_create#14341

Open
rtibblesbot wants to merge 1 commit intolearningequality:developfrom
rtibblesbot:issue-13657-af5a49
Open

Optimize Role and Membership bulk creation with bulk_create#14341
rtibblesbot wants to merge 1 commit intolearningequality:developfrom
rtibblesbot:issue-13657-af5a49

Conversation

@rtibblesbot
Copy link
Contributor

Summary

Replaces individual get_or_create loops in RoleListSerializer and MembershipListSerializer with Django's bulk_create for significantly better performance when assigning coaches to multiple classes or enrolling multiple users.

Morango SyncableModel instances need manual preparation before bulk_create since save() is bypassed — a _prepare_for_bulk_create helper calls pre_save(), calculate_uuid(), and sets _morango_dirty_bit. Both serializers filter out already-existing records by their deterministic morango UUID before calling bulk_create, so the response accurately reflects only newly created objects.

Classroom coach roles still use individual get_or_create because Role.save() has a side effect of creating an ASSIGNABLE_COACH role at facility level. All other roles (facility-level admin/coach) and all memberships use the optimized bulk path.

Validation is preserved:

  • Roles for learner groups / ad-hoc groups are rejected
  • Memberships for facilities are rejected
  • Learner group memberships require parent classroom membership
  • Dataset integrity is enforced via pre_save()ensure_dataset()
Test evidence
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_classroom_coach_creates_assignable_coach_role PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_classroom_coach_no_duplicate_assignable_coach PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_duplicate_roles_idempotent PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_facility_admin_roles PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_learnergroup_role_rejected PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_multiple_roles PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_roles_have_valid_morango_fields PASSED
kolibri/core/auth/test/test_api.py::PrepareForBulkCreateTestCase::test_sets_morango_fields_on_membership PASSED
kolibri/core/auth/test/test_api.py::PrepareForBulkCreateTestCase::test_sets_morango_fields_on_role PASSED
kolibri/core/auth/test/test_api.py::PrepareForBulkCreateTestCase::test_uuid_is_deterministic PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_facility_membership_rejected PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_learnergroup_membership_requires_classroom_membership PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_memberships PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_memberships_have_valid_morango_fields PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_memberships_idempotent PASSED

15 passed, 0 failed

References

Closes #13657
Related: #13614 (original bulk coach assignment PR that identified this optimization)

Reviewer guidance

  • The key design decision is separating classroom roles (individual save for ASSIGNABLE_COACH side effect) from other roles (bulk create). See RoleListSerializer.create() for the branching logic.
  • _prepare_for_bulk_create accesses morango internals (pre_save, calculate_uuid, _morango_dirty_bit) — there's no public morango API for bulk operations on SyncableModel. The PrepareForBulkCreateTestCase tests verify these fields are set correctly and that UUIDs are deterministic.
  • _get_batch_size follows the same pattern as kolibri.core.auth.utils.migrate._batch_save to avoid SQLite variable limits.
  • Both serializers are wrapped in transaction.atomic() for consistency.
  • To test: In Facility > Users, select multiple users and bulk-assign them as coaches to classes. Verify roles are created and the assignable coach role appears at facility level.

AI usage

Claude Code implemented this feature following the approach outlined in the issue thread. The implementation was reviewed for code reuse (checked against existing _batch_save patterns), code quality (validation coverage, transaction consistency), and efficiency (batch_size protection, justified pre-filter query). Tests were written to cover all paths including morango field integrity, idempotency, and validation edge cases.


@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?
  • Ran pre-flight CI checks (lint, format, tests) and verified all pass
  • Rebased onto the target branch and resolved any conflicts
  • Reorganized commit history into clean, logical commits
  • Audited the diff to ensure only issue-relevant files are changed
  • Built PR body from the repository's PR template with evidence blocks

Closes #13657

@github-actions github-actions bot added DEV: backend Python, databases, networking, filesystem... SIZE: medium labels Mar 7, 2026
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start, but I think we can make a cleaner approach.

Let's keep the focus on efficient bulk creates - so a small number of efficient queries for validation and creation.

@rtibblesbot
Copy link
Contributor Author

Thanks for the review! I've addressed all the feedback:

  • Extracted validate_role(), ensure_coach_role_at_facility(), and validate_membership() from the model save() methods so they can be called outside of save
  • Moved all validation from create() to validate() in both list serializers
  • Batched the learnergroup parent membership check with a single query instead of per-item N+1 queries
  • All roles (including classroom) now use bulk_create, with ASSIGNABLE_COACH side effects handled in batch afterward
  • Reorganized tests per your suggestions

@rtibblesbot
Copy link
Contributor Author

Responses to review feedback:

  • Done — extracted validate_role() and ensure_coach_role_at_facility() from Role.save(). save() now calls both, and the serializer calls them at the batch level. The bulk path handles ASSIGNABLE_COACH creation with a single query to find users who already have facility roles, then bulk creates the missing ones.

  • Done — moved all validation to RoleListSerializer.validate(). Collection kind checks and classroom-coach validation happen during DRF validation now, before create() is called.

  • Moved to MembershipListSerializer.validate(). The facility check and learnergroup parent membership check both happen during validation now.

  • Consolidated in MembershipListSerializer.validate(). The learnergroup parent membership check now uses a single batch query — collecting all (parent_id, user_id) pairs and checking them in one Membership.objects.filter() call instead of per-item .exists() queries.

  • Renamed to RoleAPITestCase.

  • Moved the bulk create tests into MembershipAPITestCase. Added bulk_user1/bulk_user2/bulk_user3 fixtures (without pre-existing memberships) to avoid conflicts with existing test data.

  • Moved to a top-level import.

  • Moved to a top-level import.

  • Moved to a top-level import.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant improvement already, great work!

I realize this wasn't explicitly in the acceptance criteria, but we should be thinking about the overall performance here for the bulk create - it doesn't help to make the writes in bulk, if we're still doing hundreds of individual read queries (this is OK on SQLite, but it will be slow in postgres). So making sure we're not scaling queries as O(n) would be great to see in the tests.

@rtibblesbot
Copy link
Contributor Author

Thanks for the thorough review! Here's a summary of changes:

  • Validation consolidation: RoleListSerializer.validate() now delegates to Role.validate_role() instead of duplicating the logic. MembershipListSerializer.validate() keeps its batched approach since validate_membership() does per-instance DB queries for the learnergroup parent check, which would create N+1 issues in bulk.
  • ignore_conflicts + pre-filter: Both are kept. The pre-filter determines which objects are new (for the response), while ignore_conflicts is a database-level safety net against race conditions between the SELECT and INSERT.
  • Query count tests: Added scaling tests that compare query counts between batch-of-1 and batch-of-3, asserting the difference is bounded by DRF's per-item PK field resolution overhead. Our serializer's validate() and create() use O(1) queries.
  • Test reorganization: Merged PrepareForBulkCreateTestCase into RoleAPITestCase and MembershipAPITestCase with inline imports.

@rtibblesbot
Copy link
Contributor Author

Responses to review feedback:

  • Done - moved to inline import within the test method.

  • Done - moved to inline import within the test method.

  • Done - moved to inline import within the test method.

  • Done - merged the membership morango fields test into MembershipAPITestCase and the role/UUID tests into RoleAPITestCase. Removed the separate PrepareForBulkCreateTestCase class.

  • It's already named RoleAPITestCase - confirmed this is the right name.

  • Fixed - RoleListSerializer.validate() now delegates to Role.validate_role() (the model method extracted from save()), so validation logic lives in one place and is called by both the serializer and save().

  • Addressed in two ways:

  1. Validation consolidation: RoleListSerializer.validate() now calls Role.validate_role() directly instead of duplicating the checks. For MembershipListSerializer, the batched approach is kept because validate_membership() does a per-instance DB query for the learnergroup parent check - calling it per-item would create the N+1 problem we're trying to avoid.

  2. Query count tests: Added test_bulk_create_query_count_does_not_scale tests that compare batch-of-1 vs batch-of-3 and assert the query count difference is bounded. This proves our serializer code uses O(1) queries. The only O(N) queries come from DRF's PrimaryKeyRelatedField resolution (2 per item for collection + user), which is outside our serializer code.

  • Done - RoleListSerializer.validate() now delegates to Role.validate_role() instead of reimplementing the checks.

  • This was already done in the models.py changes - validate_role(), validate_membership(), and ensure_coach_role_at_facility() are all extracted from save() as independent methods. The serializer now calls validate_role() directly. The ASSIGNABLE_COACH side effect uses a batched approach in the serializer for performance, but the business rule is defined by ensure_coach_role_at_facility() on the model.

  • They serve complementary purposes, so both are kept:

  • Pre-filter: Needed to determine which objects are newly created so we can return only those in the response (tests assert idempotency behavior).

  • ignore_conflicts: Database-level safety net against race conditions. Between the pre-filter SELECT and the bulk_create INSERT, a concurrent request could create the same object - ignore_conflicts prevents an IntegrityError in that case.

Dropping either one loses something: without the pre-filter we can't tell what's new; without ignore_conflicts we're vulnerable to races.

  • Added test_bulk_create_query_count_does_not_scale tests for both Role and Membership bulk creation. They compare query counts between batch-of-1 and batch-of-3 and assert the difference is bounded by DRF's per-item PK field resolution (2 queries per item for collection + user). This proves our serializer's validate() and create() use a fixed number of queries regardless of batch size.

Note: DRF's PrimaryKeyRelatedField resolution is inherently O(N) (one query per field per item) and is outside our serializer code. Addressing that would require overriding to_internal_value on the list serializer to batch-resolve PKs, which could be a follow-up optimization.

@rtibblesbot rtibblesbot force-pushed the issue-13657-af5a49 branch 2 times, most recently from 692ce4e to 72accb7 Compare March 7, 2026 04:43
@rtibbles rtibbles self-assigned this Mar 10, 2026
Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to clean up inline imports.

@rtibblesbot
Copy link
Contributor Author

Moved all inline imports to file-level. Thanks for catching these.

@rtibbles
Copy link
Member

Not sure why your moving of imports into the module scope warrants its own commit when you've been happy to rebase and force push. Please clean up.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@rtibblesbot
Copy link
Contributor Author

Squashed the import cleanup commit into the main commit. Now a single commit on the branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

DEV: backend Python, databases, networking, filesystem... SIZE: large SIZE: medium

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bulk Coach Assignment: Optimize creation w/ bulk_create if possible

3 participants