Optimize Role and Membership bulk creation with bulk_create by rtibblesbot · Pull Request #14341 · learningequality/kolibri

rtibblesbot · 2026-03-07T02:52:29Z

Summary

Replaces individual get_or_create loops in RoleListSerializer and MembershipListSerializer with Django's bulk_create for significantly better performance when assigning coaches to multiple classes or enrolling multiple users.

Morango SyncableModel instances need manual preparation before bulk_create since save() is bypassed — a _prepare_for_bulk_create helper calls pre_save(), calculate_uuid(), and sets _morango_dirty_bit. Both serializers filter out already-existing records by their deterministic morango UUID before calling bulk_create, so the response accurately reflects only newly created objects.

Classroom coach roles still use individual get_or_create because Role.save() has a side effect of creating an ASSIGNABLE_COACH role at facility level. All other roles (facility-level admin/coach) and all memberships use the optimized bulk path.

Validation is preserved:

Roles for learner groups / ad-hoc groups are rejected
Memberships for facilities are rejected
Learner group memberships require parent classroom membership
Dataset integrity is enforced via pre_save() → ensure_dataset()

Test evidence

kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_classroom_coach_creates_assignable_coach_role PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_classroom_coach_no_duplicate_assignable_coach PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_duplicate_roles_idempotent PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_facility_admin_roles PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_learnergroup_role_rejected PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_multiple_roles PASSED
kolibri/core/auth/test/test_api.py::RoleBulkCreateAPITestCase::test_bulk_create_roles_have_valid_morango_fields PASSED
kolibri/core/auth/test/test_api.py::PrepareForBulkCreateTestCase::test_sets_morango_fields_on_membership PASSED
kolibri/core/auth/test/test_api.py::PrepareForBulkCreateTestCase::test_sets_morango_fields_on_role PASSED
kolibri/core/auth/test/test_api.py::PrepareForBulkCreateTestCase::test_uuid_is_deterministic PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_facility_membership_rejected PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_learnergroup_membership_requires_classroom_membership PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_memberships PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_memberships_have_valid_morango_fields PASSED
kolibri/core/auth/test/test_api.py::MembershipBulkCreateAPITestCase::test_bulk_create_memberships_idempotent PASSED

15 passed, 0 failed

References

Closes #13657
Related: #13614 (original bulk coach assignment PR that identified this optimization)

Reviewer guidance

The key design decision is separating classroom roles (individual save for ASSIGNABLE_COACH side effect) from other roles (bulk create). See RoleListSerializer.create() for the branching logic.
_prepare_for_bulk_create accesses morango internals (pre_save, calculate_uuid, _morango_dirty_bit) — there's no public morango API for bulk operations on SyncableModel. The PrepareForBulkCreateTestCase tests verify these fields are set correctly and that UUIDs are deterministic.
_get_batch_size follows the same pattern as kolibri.core.auth.utils.migrate._batch_save to avoid SQLite variable limits.
Both serializers are wrapped in transaction.atomic() for consistency.
To test: In Facility > Users, select multiple users and bulk-assign them as coaches to classes. Verify roles are created and the assignable coach role appears at facility level.

AI usage

Claude Code implemented this feature following the approach outlined in the issue thread. The implementation was reviewed for code reuse (checked against existing _batch_save patterns), code quality (validation coverage, transaction consistency), and efficiency (batch_size protection, justified pre-filter query). Tests were written to cover all paths including morango field integrity, idempotency, and validation edge cases.

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Ran pre-flight CI checks (lint, format, tests) and verified all pass
Rebased onto the target branch and resolved any conflicts
Reorganized commit history into clean, logical commits
Audited the diff to ensure only issue-relevant files are changed
Built PR body from the repository's PR template with evidence blocks

Closes #13657

rtibbles

This is a good start, but I think we can make a cleaner approach.

Let's keep the focus on efficient bulk creates - so a small number of efficient queries for validation and creation.

kolibri/core/auth/test/test_api.py

kolibri/core/auth/serializers.py

rtibblesbot · 2026-03-07T03:27:40Z

Thanks for the review! I've addressed all the feedback:

Extracted validate_role(), ensure_coach_role_at_facility(), and validate_membership() from the model save() methods so they can be called outside of save
Moved all validation from create() to validate() in both list serializers
Batched the learnergroup parent membership check with a single query instead of per-item N+1 queries
All roles (including classroom) now use bulk_create, with ASSIGNABLE_COACH side effects handled in batch afterward
Reorganized tests per your suggestions

rtibblesbot · 2026-03-07T03:27:45Z

Responses to review feedback:

Done — extracted validate_role() and ensure_coach_role_at_facility() from Role.save(). save() now calls both, and the serializer calls them at the batch level. The bulk path handles ASSIGNABLE_COACH creation with a single query to find users who already have facility roles, then bulk creates the missing ones.
Done — moved all validation to RoleListSerializer.validate(). Collection kind checks and classroom-coach validation happen during DRF validation now, before create() is called.
Moved to MembershipListSerializer.validate(). The facility check and learnergroup parent membership check both happen during validation now.
Consolidated in MembershipListSerializer.validate(). The learnergroup parent membership check now uses a single batch query — collecting all (parent_id, user_id) pairs and checking them in one Membership.objects.filter() call instead of per-item .exists() queries.
Renamed to RoleAPITestCase.
Moved the bulk create tests into MembershipAPITestCase. Added bulk_user1/bulk_user2/bulk_user3 fixtures (without pre-existing memberships) to avoid conflicts with existing test data.
Moved to a top-level import.
Moved to a top-level import.
Moved to a top-level import.

github-actions · 2026-03-07T03:30:15Z

Build Artifacts

Asset type	Download link
PEX file	kolibri-0.20.0.dev0_git.20260323152324.pex
Windows Installer (EXE)	kolibri-0.20.0.dev0+git.20260323152324-windows-setup-unsigned.exe
Debian Package	kolibri_0.20.0.dev0+git.20260323152324-0ubuntu1_all.deb
Mac Installer (DMG)	kolibri-0.20.0.dev0+git.20260323152324.dmg
Android Package (APK)	kolibri-0.20.0.dev0+git.20260323152324-0.1.7-debug.apk
Raspberry Pi Image	kolibri-pi-image-0.20.0.dev0+git.20260323152324.zip
TAR file	kolibri-0.20.0.dev0+git.20260323152324.tar.gz
WHL file	kolibri-0.20.0.dev0+git.20260323152324-py2.py3-none-any.whl

Smoke test screenshot

Download screenshot

rtibbles

This is a significant improvement already, great work!

I realize this wasn't explicitly in the acceptance criteria, but we should be thinking about the overall performance here for the bulk create - it doesn't help to make the writes in bulk, if we're still doing hundreds of individual read queries (this is OK on SQLite, but it will be slow in postgres). So making sure we're not scaling queries as O(n) would be great to see in the tests.

kolibri/core/auth/serializers.py

rtibblesbot · 2026-03-07T04:32:50Z

Thanks for the thorough review! Here's a summary of changes:

Validation consolidation: RoleListSerializer.validate() now delegates to Role.validate_role() instead of duplicating the logic. MembershipListSerializer.validate() keeps its batched approach since validate_membership() does per-instance DB queries for the learnergroup parent check, which would create N+1 issues in bulk.
ignore_conflicts + pre-filter: Both are kept. The pre-filter determines which objects are new (for the response), while ignore_conflicts is a database-level safety net against race conditions between the SELECT and INSERT.
Query count tests: Added scaling tests that compare query counts between batch-of-1 and batch-of-3, asserting the difference is bounded by DRF's per-item PK field resolution overhead. Our serializer's validate() and create() use O(1) queries.
Test reorganization: Merged PrepareForBulkCreateTestCase into RoleAPITestCase and MembershipAPITestCase with inline imports.

rtibblesbot · 2026-03-07T04:32:54Z

Responses to review feedback:

Done - moved to inline import within the test method.
Done - moved to inline import within the test method.
Done - moved to inline import within the test method.
Done - merged the membership morango fields test into MembershipAPITestCase and the role/UUID tests into RoleAPITestCase. Removed the separate PrepareForBulkCreateTestCase class.
It's already named RoleAPITestCase - confirmed this is the right name.
Fixed - RoleListSerializer.validate() now delegates to Role.validate_role() (the model method extracted from save()), so validation logic lives in one place and is called by both the serializer and save().
Addressed in two ways:

Validation consolidation: RoleListSerializer.validate() now calls Role.validate_role() directly instead of duplicating the checks. For MembershipListSerializer, the batched approach is kept because validate_membership() does a per-instance DB query for the learnergroup parent check - calling it per-item would create the N+1 problem we're trying to avoid.
Query count tests: Added test_bulk_create_query_count_does_not_scale tests that compare batch-of-1 vs batch-of-3 and assert the query count difference is bounded. This proves our serializer code uses O(1) queries. The only O(N) queries come from DRF's PrimaryKeyRelatedField resolution (2 per item for collection + user), which is outside our serializer code.

Done - RoleListSerializer.validate() now delegates to Role.validate_role() instead of reimplementing the checks.
This was already done in the models.py changes - validate_role(), validate_membership(), and ensure_coach_role_at_facility() are all extracted from save() as independent methods. The serializer now calls validate_role() directly. The ASSIGNABLE_COACH side effect uses a batched approach in the serializer for performance, but the business rule is defined by ensure_coach_role_at_facility() on the model.
They serve complementary purposes, so both are kept:
Pre-filter: Needed to determine which objects are newly created so we can return only those in the response (tests assert idempotency behavior).
ignore_conflicts: Database-level safety net against race conditions. Between the pre-filter SELECT and the bulk_create INSERT, a concurrent request could create the same object - ignore_conflicts prevents an IntegrityError in that case.

Dropping either one loses something: without the pre-filter we can't tell what's new; without ignore_conflicts we're vulnerable to races.

Added test_bulk_create_query_count_does_not_scale tests for both Role and Membership bulk creation. They compare query counts between batch-of-1 and batch-of-3 and assert the difference is bounded by DRF's per-item PK field resolution (2 queries per item for collection + user). This proves our serializer's validate() and create() use a fixed number of queries regardless of batch size.

Note: DRF's PrimaryKeyRelatedField resolution is inherently O(N) (one query per field per item) and is outside our serializer code. Addressing that would require overriding to_internal_value on the list serializer to batch-resolve PKs, which could be a follow-up optimization.

rtibbles

Still need to clean up inline imports.

kolibri/core/auth/test/test_api.py

rtibblesbot · 2026-03-11T03:53:37Z

Moved all inline imports to file-level. Thanks for catching these.

rtibbles · 2026-03-11T16:39:08Z

Not sure why your moving of imports into the module scope warrants its own commit when you've been happy to rebase and force push. Please clean up.

kolibri/core/auth/test/test_api.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rtibblesbot · 2026-03-23T15:23:34Z

Squashed the import cleanup commit into the main commit. Now a single commit on the branch.

github-actions bot added DEV: backend Python, databases, networking, filesystem... SIZE: medium labels Mar 7, 2026

rtibbles requested changes Mar 7, 2026

View reviewed changes

github-actions bot added the SIZE: large label Mar 7, 2026

rtibblesbot force-pushed the issue-13657-af5a49 branch from 681e189 to 866a747 Compare March 7, 2026 03:31

rtibbles reviewed Mar 7, 2026

View reviewed changes

kolibri/core/auth/serializers.py Show resolved Hide resolved

kolibri/core/auth/serializers.py Outdated Show resolved Hide resolved

rtibblesbot force-pushed the issue-13657-af5a49 branch from 866a747 to b00bbab Compare March 7, 2026 04:16

rtibblesbot force-pushed the issue-13657-af5a49 branch 2 times, most recently from 692ce4e to 72accb7 Compare March 7, 2026 04:43

rtibbles self-assigned this Mar 10, 2026

rtibbles requested changes Mar 11, 2026

View reviewed changes

kolibri/core/auth/test/test_api.py Outdated Show resolved Hide resolved

kolibri/core/auth/test/test_api.py Outdated Show resolved Hide resolved

kolibri/core/auth/test/test_api.py Outdated Show resolved Hide resolved

rtibblesbot force-pushed the issue-13657-af5a49 branch from 72accb7 to 21c1ab5 Compare March 11, 2026 03:49

rtibblesbot force-pushed the issue-13657-af5a49 branch from 0836c80 to 9c12d92 Compare March 11, 2026 16:40

rtibblesbot force-pushed the issue-13657-af5a49 branch from 9c12d92 to 20a239f Compare March 20, 2026 19:21

nucleogenesis reviewed Mar 23, 2026

View reviewed changes

kolibri/core/auth/test/test_api.py Show resolved Hide resolved

Optimize Role and Membership bulk creation with bulk_create

1e08df1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rtibblesbot force-pushed the issue-13657-af5a49 branch from 20a239f to 1e08df1 Compare March 23, 2026 15:23

Conversation

rtibblesbot commented Mar 7, 2026

Summary

References

Reviewer guidance

AI usage

Uh oh!

rtibbles left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rtibblesbot commented Mar 7, 2026

Uh oh!

rtibblesbot commented Mar 7, 2026

Uh oh!

github-actions bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Build Artifacts

Uh oh!

rtibbles left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rtibblesbot commented Mar 7, 2026

Uh oh!

rtibblesbot commented Mar 7, 2026

Uh oh!

rtibbles left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rtibblesbot commented Mar 11, 2026

Uh oh!

rtibbles commented Mar 11, 2026

Uh oh!

Uh oh!

rtibblesbot commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 7, 2026 •

edited

Loading