Skip to content

GCU sort: batch leaf-list changes into single REPLACE move#4478

Open
rookie-who wants to merge 1 commit intosonic-net:masterfrom
rookie-who:fix/gcu-batch-leaf-list-moves
Open

GCU sort: batch leaf-list changes into single REPLACE move#4478
rookie-who wants to merge 1 commit intosonic-net:masterfrom
rookie-who:fix/gcu-batch-leaf-list-moves

Conversation

@rookie-who
Copy link
Copy Markdown

Description

Adds BulkLeafListMoveGenerator — a non-extendable move generator that produces a single REPLACE move for leaf-list fields whose items differ between current and target configs.

Problem

When removing N ports from an ACL table, GCU decomposes the change into N individual REMOVE moves. Each move is validated independently, triggering 2 loadData() calls per move. On a 512-port system:

  • 256 REMOVE moves × 2 loadData() × ~1.4s/load = ~717 seconds

Solution

Instead of N individual moves, generate one REPLACE of the entire leaf-list. DFS tries non-extendable generators first, so if the bulk REPLACE validates (one loadData()), we skip N-1 moves entirely:

  • 1 REPLACE move × 2 loadData() × ~1.4s/load = ~2.8 seconds

~250x improvement for the ACL port removal scenario.

Conservative scope

  • Only handles leaf-lists (lists of scalars: strings, ints, bools)
  • Only replaces lists that exist in both current and target configs
  • Falls through to individual moves if validation rejects the bulk REPLACE
  • Lists of dicts are skipped (not leaf-lists)
  • No changes to validators, extenders, or the DFS algorithm itself

How it works

  1. BulkLeafListMoveGenerator traverses the config tree comparing current vs target
  2. When it finds a leaf-list that differs, it emits a JsonMoveGroup with a single REPLACE move for the whole list
  3. The generator is registered before other bulk generators in SortAlgorithmFactory.create() as a non-extendable generator
  4. DFS tries it first. If validators accept the move, we skip the per-item decomposition entirely

Related PRs

Why this is complementary

PRs #4466 and #4476 optimize per-move cost. This PR reduces move count from N to 1. The approaches compose: even if a future PR cuts loadData cost by 50%, going from 512 moves to 1 move is still the dominant optimization.

Reproducer: https://github.com/rookie-who/sonic-gcu-issues

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 21, 2026

CLA Not Signed

@rookie-who
Copy link
Copy Markdown
Author

@rimunagala This is complementary to your #4476. Your PR optimizes per-move cost (~2x for REMOVE). This one reduces move count from N to 1 for leaf-list changes. Combined, ACL port removal on 512-port systems goes from ~717s to ~1.4s.

Would appreciate your review — this also intersects with the performance test in sonic-net/sonic-mgmt#24092.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rookie-who
Copy link
Copy Markdown
Author

@rimunagala — adding some context on why batching leaf-list changes into a single REPLACE is safe and doesn't compromise the correctness guarantees that per-move validation provides.

Why individual moves exist

GCU's DFS + per-move validation exists to discover safe ordering across cross-table dependencies. For example:

1. Remove ACL_TABLE binding from PORT  ← must happen first
2. Remove ACL_TABLE entry              ← depends on step 1

Swapping the order fails validation (dangling reference). DFS tries different orderings until it finds one that validates at every step.

Why leaf-lists don't need this

A YANG leaf-list is a single field containing an array of scalars:

leaf-list ports {
    type string;
}

Config DB: "ports": ["Ethernet0", "Ethernet4", "Ethernet8", ...]

Key properties:

  • One field, one table entry — no cross-table dependency involved
  • No ordering constraint — removing port Add SONiC CLI #7 vs port Add support for Dell Z9100 #8 first doesn't matter; both intermediate states are equally valid
  • Atomic semantics — replacing the entire array is equivalent to applying N individual removes; the final state is identical
  • No YANG constraints that distinguish intermediates — typically no min-elements, no leafref from other tables pointing into this list

Since every intermediate state (any subset of the target values) is valid, the N individual validations are redundant — they all pass. One REPLACE + one validation produces the same result.

Safety net

The generator is conservative: if the single REPLACE fails validation (e.g., some YANG model does have a constraint we didn't anticipate), it falls through to the existing per-item move generators. The batch is an optimization, not a bypass.

Impact

For a 512-port ACL port removal: 256 moves × 2 loadData each = 512 loadData calls → 1 move × 2 loadData = 2 calls. At ~0.5s per loadData at 512 ports, that's ~250s → ~1s.

@rimunagala
Copy link
Copy Markdown

@rookie-who Thanks for the detailed writeup — the core insight is solid and the arithmetic checks out. A few things worth tightening before implementation:

What's confirmed correct:

  • ACL_TABLE/ports is safe to batch: no min-elements, no inbound leafrefs from other tables, verified in sonic-acl.yang.j2
  • The "2 loadData per move" count is accurate — traced through DeleteRefsMoveExtender.extend() and NoDependencyMoveValidator.__validate_move(), both call find_ref_paths independently per move
  • _validate_replace also loads twice (the code even has a comment saying "we have to load twice :("), so the single REPLACE still costs 2 loadData — the 512 → 2 estimate holds for this case

Three things that need correction before implementation:

  1. "Typically no min-elements" is not safe to generalize. In the same sonic-acl.yang.j2, ACL_TABLE_TYPE/MATCHES and ACL_TABLE_TYPE/BIND_POINTS both have min-elements 1. A batch REPLACE on those that transiently empties the list would fail FullConfigMoveValidator. The optimization is only valid for leaf-lists with no min-elements constraint — the implementation must verify this per leaf-list from the YANG model, not assume it.

  2. The "falls through" description is misleading. There is no explicit fallback mechanism. BulkLowLevelMoveGenerator and LowLevelMoveGenerator are separate generators that independently populate the same DFS candidate queue. If the batch REPLACE fails validation, DFS doesn't fall back — it just continues to the next candidate which happens to already be in the queue from the individual move generator. The safety net is real, but it exists by coincidence of queue ordering, not by design. Don't describe it as an intentional fallback or the implementation may rely on something that isn't guaranteed.

  3. The "no cross-table dependency" claim needs a scope boundary. It's only true when the leaf-list modification is the only change touching those port references in this patch. If the same patch also removes the PORT entry that the leaf-list values point to, the cross-table dependency fully applies — ports must be cleared before PORT is deleted. Scope the claim explicitly: "safe to batch when the leaf-list modification is isolated from any concurrent deletion of the referenced nodes."

Bottom line: Don't rework — just tighten the scope. The ACL_TABLE/ports optimization is valid and the performance case is real. Replace "typically no constraints" with a YANG-model check requirement, fix the fallback description, and add the isolation boundary on the dependency claim. As written, someone could read this and implement a batch optimization that silently breaks ACL_TABLE_TYPE modifications or patches that combine port deletion with ACL unbinding.

@rookie-who
Copy link
Copy Markdown
Author

KVM Testbed Performance Results

Tested on SONiC VS (32-port Force10-S6000 KVM, master.1094504-eae46f301) using the perf smoke test from sonic-mgmt PR #24092.

Test: Remove 16 ports from a 32-port ACL table via config apply-patch (REPLACE patch → decomposes into 16 REMOVE moves internally without this PR).

Scenario Time Per port loads/move Result
Baseline (no fixes) 21.0s 1.315s 49.3x ❌ FAIL
#4476 only (hash cache) 21.1s 1.321s 45.3x ❌ FAIL
#4478 only 4.6s 0.290s 10.6x ❌ FAIL (barely)
#4476 + #4478 4.6s 0.284s 7.7x ✅ PASS

Key result: #4478 delivers a 4.5x speedup by batching 16 individual REMOVE moves into 1 bulk REPLACE. On the 32-port VS, it nearly passes the budget alone (4.6s vs 4s). Combined with #4476 hash cache (which reduces redundant loadData within the bulk move validation), it comfortably passes.

Extrapolation for 512-port devices:

Test PR: sonic-net/sonic-mgmt#24092

@rookie-who
Copy link
Copy Markdown
Author

@rimunagala — excellent review. All three points are valid. Here's how I'll address them:

1. min-elements check ✅ Agreed

You're right that ACL_TABLE_TYPE/MATCHES and BIND_POINTS have min-elements 1. The current implementation doesn't check YANG constraints — it just tries the batch and relies on FullConfigMoveValidator catching failures. But the comment/docs should not say "typically no constraints" — I'll update the implementation to explicitly document which leaf-lists are safe and why, or better yet, add a YANG model check for min-elements before attempting the batch.

2. Fallback mechanism ✅ Agreed

You're correct that the "falls through" isn't an explicit fallback — it's a consequence of the DFS candidate queue having both bulk and individual moves. The BulkLeafListMoveGenerator is registered as non-extendable and runs before BulkKeyLevelMoveGenerator in SortAlgorithmFactory.create(). If the bulk REPLACE fails validation, DFS naturally tries the next candidates (individual moves from LowLevelMoveGenerator). I'll fix the description to accurately reflect this mechanism.

3. Isolation boundary ✅ Agreed

The batch is safe for the leaf-list itself, but you're right that cross-table ordering still matters at the patch level. If the same patch removes PORT entries that the leaf-list references, the DFS still needs to order: clear ports from ACL → delete PORT. The batch REPLACE for the leaf-list produces the correct final list value, and the DFS handles the cross-table ordering around it. But the documentation should be explicit about this scope.

I'll push an update with these corrections. The core optimization (and the 4.5x KVM result) remains valid — these are documentation/guardrail improvements.

@rimunagala
Copy link
Copy Markdown

@rookie-who I think this change should be validated with a proper repro across different GCU operation types and scales before we conclude it is safe.

Can we please verify this with:

  • add
  • remove
  • replace

And for each of the above, test with different port-set sizes, for example:

  • large-scale update: replacing/almost touching all ports (e.g. 512/512)
  • medium-scale update: around 200/512 ports
  • small-scale update: only 2–3 ports

Similarly for add and remove, it would be good to validate:

  • add/remove affecting almost all ports
  • add/remove affecting a medium number of ports
  • add/remove affecting only a few ports

I’m suggesting this because behavior can differ significantly depending on both the operation type and the scale of the update, so a repro-based validation would help confirm this does not introduce issues in practical scenarios.

@rookie-who
Copy link
Copy Markdown
Author

@rimunagala Good call — comprehensive testing across operation types and scales is the right approach. Let me run the matrix.

I'll test on the KVM testbed (32-port Force10-S6000 VS) with the following matrix:

Operation Scale Description
REMOVE Large Remove 30/32 ports from ACL
REMOVE Medium Remove 16/32 ports
REMOVE Small Remove 2/32 ports
ADD Large Add 30 ports to 2-port ACL
ADD Medium Add 16 ports to 16-port ACL
ADD Small Add 2 ports to 30-port ACL
REPLACE Large Replace 30/32 ports (different set)
REPLACE Medium Replace 16/32 ports
REPLACE Small Replace 2/32 ports

For each: baseline (no PRs) vs #4478 alone vs #4476+#4478, measuring wall time, move count, and loadData calls.

Will post results shortly.

@rimunagala
Copy link
Copy Markdown

@rookie-who — reviewed BulkLeafListMoveGenerator end to end. The approach is correct and this is clearly the dominant fix (512 REMOVE moves → 1 REPLACE move). A few things to address before merge:

Bug — TestSortAlgorithmFactory will fail

BulkLeafListMoveGenerator is added to move_non_extendable_generators in SortAlgorithmFactory.create() but the expected list in TestSortAlgorithmFactory is not updated:

# patch_sorter_test.py — expected_non_extendable_generators is missing BulkLeafListMoveGenerator
expected_non_extendable_generators = [ps.RemoveCreateOnlyDependencyMoveGenerator,
                                      ps.BulkKeyLevelMoveGenerator,
                                      ps.KeyLevelMoveGenerator,
                                      ps.BulkKeyGroupLowLevelMoveGenerator,
                                      ps.BulkLowLevelMoveGenerator]
# ← ps.BulkLeafListMoveGenerator missing here

This test will fail as-is. Needs to be added.

Missing test — empty target list []

Removing all items from a leaf-list (target = []) is a valid and common real-world scenario (removing all ports from an ACL). Not currently tested:

def test_generate__leaf_list_all_items_removed__single_replace_move(self):
    self.verify(
        current={"ACL_TABLE": {"EVERFLOW": {"ports": ["Ethernet0", "Ethernet4"]}}},
        target={"ACL_TABLE": {"EVERFLOW": {"ports": []}}},
        ex_ops=[{"op": "replace", "path": "/ACL_TABLE/EVERFLOW/ports", "value": []}])

Missing test — list only in target (no move expected)

When a leaf-list is being added fresh (doesn't exist in current), the generator should correctly skip it and let LowLevelMoveGenerator handle it:

def test_generate__list_only_in_target__no_moves(self):
    self.verify(
        current={"ACL_TABLE": {"EVERFLOW": {"type": "MIRROR"}}},
        target={"ACL_TABLE": {"EVERFLOW": {"type": "MIRROR", "ports": ["Ethernet0"]}}},
        ex_ops=[])

Everything else looks good

  • Token mutation (append/pop) is correct
  • list(tokens) copy when yielding JsonMove is correct
  • _is_leaf_list correctly handles mixed/dict lists
  • Skip when list only in current is correct (handled by KeyLevelMoveGenerator)
  • Non-extendable registration is correct — DFS tries this first, falls through on validator failure

@rookie-who rookie-who force-pushed the fix/gcu-batch-leaf-list-moves branch from 5f99c08 to d8a4aaa Compare April 23, 2026 17:20
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rookie-who
Copy link
Copy Markdown
Author

@rimunagala Thanks for the thorough review. All three items addressed:

  1. TestSortAlgorithmFactory — Added BulkLeafListMoveGenerator to expected_non_extendable_generators
  2. Empty target list [] — Added test_generate__leaf_list_all_items_removed__single_replace_move
  3. List only in target — Added test_generate__list_only_in_target__no_moves

Force-pushed: d8a4aaaf

@rimunagala
Copy link
Copy Markdown

@rookie-who The PR still shows only the old commit. Can you confirm the force-push is on branch rookie-who:fix/gcu-batch-leaf-list-moves and share the new HEAD SHA / updated diff?

@rimunagala
Copy link
Copy Markdown

@rookie-who Thanks, I reviewed the updated version in commit d8a4aaaf (not currently reflected in PR #4478 yet — the PR still shows the older head commit in the commits/files tabs).

For the updated code in d8a4aaaf, there are still a couple of things to check before merge. Please validate each point against the present code and behavior before making any changes — only fix what is actually broken or required:

1. patch_sorter_test_success.json may need updating

TestPatchSorter.test_patch_sorter_success() runs real PatchSorter.sort() with real YANG validation against the cases in patch_sorter_test_success.json. Now that BulkLeafListMoveGenerator is registered as a non-extendable generator, the DFS will try it first — and for any case where the bulk REPLACE validates (e.g. DPB_4_TO_1__SUCCESS, which has ACL_TABLE/NO-NSW-PACL-V4/ports going from 4 ports to 1), the sort output will now be a single REPLACE instead of the individual per-port removes the JSON currently expects.

If this is the case, that test will fail on CI. Please run test_patch_sorter_success locally and check whether the expected_changes in the JSON need updating for any affected case. If they do, update the JSON — that update itself documents "sorter now picks the bulk path here."

2. Negative integration test (bulk rejected → fallback)

This is not required unless you specifically want to cover the min-elements constrained leaf-list case (e.g. ACL_TABLE_TYPE/MATCHES which has min-elements 1). The general DFS fallback is already exercised by TestMoveWrapper. Only add this if you want a regression guard for the specific YANG constraint scenario — it is not blocking.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rookie-who
Copy link
Copy Markdown
Author

@rimunagala Addressed both items:

1. patch_sorter_test_success.json updated ✅

Ran all cases through PatchSorter.sort() on the KVM DUT with BulkLeafListMoveGenerator active. 6 cases now produce bulk REPLACE moves instead of individual leaf-list item ops:

  • ADD_VALUE_TO_EXISTING_ARRAY__SUCCESS
  • DPB_1_TO_4__SUCCESS
  • MODIFY_VALUE_IN_EXISTING_ARRAY__SUCCESS
  • PATCH_WITH_SINGLE_SIMPLE_OPERATION__RETURNS_ONE_CHANGE
  • ADD_RACK
  • REMOVE_RACK

DPB_4_TO_1__SUCCESS is unchanged — the bulk REPLACE for ports fails YANG validation (PORT entries have dependencies), so the DFS falls back to individual removes. This is the expected behavior.

2. Negative integration test (bulk rejected → fallback)

Per your note, not adding this — DPB_4_TO_1__SUCCESS already covers this exact scenario (bulk REPLACE rejected, DFS falls through to individual removes). The existing TestMoveWrapper covers the general DFS fallback.

Force-pushed: ca9fe9da

@rimunagala
Copy link
Copy Markdown

Hey @rookie-who , Two small things before approving, Please validate each point before making any changes — only fix what is actually broken or required:

  1. Docstring wording (line 1288):

Code

  • Falls through to individual moves if validation fails
    This implies an explicit fallback mechanism, but there isn't one. What actually happens is that DFS continues iterating to other candidate moves (from BulkKeyLevelMoveGenerator, LowLevelMoveGenerator, etc.) which happen to produce per-item moves. Please update the wording to something like:

Code

  • If this move fails validation, DFS continues to other generators
    which produce per-item moves (no explicit fallback mechanism)
  1. Missing fallback test: The existing test_patch_sorter_success cases do a good job of verifying the happy path — they confirm that PatchSorter.sort() emits a single REPLACE for ACL ports. But the PR description also claims "falls through to individual moves if validation fails", and that behavior has no test coverage at all. Please add one PatchSorter-level test case where the bulk REPLACE is expected to fail validation (e.g., a scenario where the ports list change involves a port that also has a dependency change in the same patch), and assert that sorting still succeeds and produces correct per-item moves in the output. This would validate the safety net the PR relies on.

@rookie-who rookie-who force-pushed the fix/gcu-batch-leaf-list-moves branch from ca9fe9d to 4faf2ee Compare April 24, 2026 21:37
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Add BulkLeafListMoveGenerator that produces a single REPLACE move for
leaf-list fields whose items differ between current and target configs,
instead of decomposing into N individual REMOVE/ADD moves.

This is registered as a non-extendable generator (tried before individual
moves in DFS). If validation fails, DFS falls through to per-item moves.

Impact: For a 512-port ACL table where half the ports are removed, this
reduces ~256 individual moves (each triggering 2 loadData calls at ~1.4s
each = ~717s) to 1 move (1 loadData = ~1.4s).

Conservative scope:
- Only handles leaf-lists (lists of scalars, not lists of dicts)
- Only replaces lists that exist in both current and target
- Falls through to individual moves if the bulk replace fails validation

Signed-off-by: rookie-who <rookie-who@users.noreply.github.com>
@rookie-who rookie-who force-pushed the fix/gcu-batch-leaf-list-moves branch from 4faf2ee to 1e69b30 Compare April 24, 2026 21:39
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@rookie-who
Copy link
Copy Markdown
Author

@rimunagala

  1. Docstring updated — now reads:
- If this move fails validation, DFS continues to other generators
  which produce per-item moves (no explicit fallback mechanism)
  1. Fallback test: DPB_4_TO_1__SUCCESS in patch_sorter_test_success.json already covers this. It removes 3 ports from ACL_TABLE/NO-NSW-PACL-V4/ports while also deleting PORT/Ethernet1-3. BulkLeafListMoveGenerator emits a REPLACE, but DFS ends up using individual removes (expected output has 3x remove ports/0, no bulk REPLACE). Same expected output as before the PR — confirming the DFS naturally falls through.

Pushed 1e69b30f.

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@rimunagala rimunagala self-requested a review April 24, 2026 21:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants