Skip to content

GCU: silence transient YANG leafref ERR during patch-sort search#4482

Merged
Blueve merged 3 commits intosonic-net:masterfrom
Xichen96:dev/xichenlin/fix-gcu-forward-leafref-check
Apr 23, 2026
Merged

GCU: silence transient YANG leafref ERR during patch-sort search#4482
Blueve merged 3 commits intosonic-net:masterfrom
Xichen96:dev/xichenlin/fix-gcu-forward-leafref-check

Conversation

@Xichen96
Copy link
Copy Markdown
Contributor

@Xichen96 Xichen96 commented Apr 22, 2026

FullConfigMoveValidator is invoked speculatively during the patch-sort search to check whether a candidate move results in a YANG-valid config. Forward leafref references produce expected transient validation failures; the sorter already handles these as a prune signal via the returned (False, error) tuple.

However, sonic_yang.loadData logs a LOG_ERR "Data Loading Failed" line to syslog on every exception (log-and-throw antipattern) before raising, so each speculative probe leaks an ERR to syslog. In practice a single config apply-patch during the GCU test suite can emit 100+ such lines, which trips loganalyzer and makes real errors hard to find.

This change threads a quiet= kwarg through:
ConfigWrapper.validate_config_db_config(..., quiet=False)
-> sonic_yang.loadData(..., quiet=quiet)

and makes FullConfigMoveValidator.validate pass quiet=True. Non- speculative callers (e.g. final target-config validation) keep the existing default (quiet=False) and continue to log on real errors.

Depends on sonic-net/sonic-buildimage: quiet= kwarg added to sonic_yang_ext.SonicYangExtMixin.loadData.

Validated on DUT with sonic-mgmt generic_config_updater subset (dhcp_relay, eth_interface, vlan_interface, portchannel_interface, cacl, lo_interface, bgp_prefix, syslog):
BEFORE: 31P/1F/3S/3E, 231 apply-patch, 115 Data Loading Failed ERR
AFTER: 31P/1F/3S/2E, 231 apply-patch, 34 Data Loading Failed ERR
Remaining ERR lines are from legitimate non-speculative callers.

What I did

Silenced the Data Loading Failed LOG_ERR spam that the GCU patch sorter emits during its speculative search, without altering the sorter's control flow or changing behavior for any non-speculative caller.

Two files in generic_config_updater/:

  • gu_common.pyConfigWrapper.validate_config_db_config gains a quiet=False kwarg, forwarded to sy.loadData(..., quiet=quiet).
  • patch_sorter.pyFullConfigMoveValidator.validate passes quiet=True, since every failure there is an expected search-prune signal.

No other caller is changed. FullConfigMoveValidator's return semantics (is_valid, error) are unchanged, so real failures still surface through the returned tuple as before.

Work item tracking (Microsoft ADO): 36655183 ([SONiC_Baseline][Failed_Case][dhcp_relay.test_dhcpv4_relay][test_dhcp_relay_with_non_default_vrf]) and 36656583 ([SONiC_Baseline][Failed_Case][dhcp_relay.test_dhcpv4_relay][test_dhcp_relay_with_different_non_default_vrf]).

Depends on sonic-net/sonic-buildimage# (adds the quiet= kwarg to SonicYangExtMixin.loadData in sonic-yang-mgmt). Must merge after that PR.

How I did it

# generic_config_updater/gu_common.py
-def validate_config_db_config(self, config_db_as_json):
+def validate_config_db_config(self, config_db_as_json, quiet=False):
     ...
-    sy.loadData(config_db_as_json)
+    sy.loadData(config_db_as_json, quiet=quiet)

# generic_config_updater/patch_sorter.py — FullConfigMoveValidator.validate
-    is_valid, error = self.config_wrapper.validate_config_db_config(simulated_config)
+    # Speculative validation during patch-sort search. Transient leafref
+    # failures are expected; sorter handles them via returned (False, err).
+    is_valid, error = self.config_wrapper.validate_config_db_config(
+        simulated_config, quiet=True)
     return is_valid, error

How to verify it

Run the sonic-mgmt generic_config_updater test subset against a T0 DUT and count "Data Loading Failed" lines in /var/log/syslog during the run.

Validated on a T0 (20251110.19) with 8 test files (dhcp_relay, eth_interface, vlan_interface, portchannel_interface, cacl, lo_interface, bgp_prefix, syslog):

BEFORE (stock):

  • 31 passed, 1 failed, 3 skipped, 3 pytest errors
  • 231 apply-patch events during the run
  • 115 Data Loading Failed lines in syslog
  • Duration 1:30:33

AFTER (fix):

  • 31 passed, 1 failed, 3 skipped, 2 pytest errors
  • 231 apply-patch events during the run
  • 34 Data Loading Failed lines in syslog
  • Duration 1:27:42

The remaining 34 ERR lines are from legitimate non-speculative callers (final target-config validation) — preserved behavior, not a bug.

Also ran dhcp_server/test_dhcp_server_port_based_customize_options with the fix: 2/2 passed, 8 apply-patch events, 0 Data Loading Failed ERR, 0 leaked sonic_yang ERR.

Default behavior (quiet=False) is exercised by every existing GCU unit test; no unit tests needed modification.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

Copilot AI review requested due to automatic review settings April 22, 2026 17:42
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces syslog noise emitted during Generic Config Updater (GCU) patch-sort’s speculative YANG validation by threading a quiet= flag into the validation path used for search/pruning, while preserving existing logging behavior for non-speculative (final) validation.

Changes:

  • Add quiet: bool = False kwarg to ConfigWrapper.validate_config_db_config() and forward it to sonic_yang.loadData(...).
  • Update FullConfigMoveValidator.validate() to call validate_config_db_config(..., quiet=True) during patch-sort search.
  • Add a unit test asserting FullConfigMoveValidator passes quiet=True to the config wrapper.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
generic_config_updater/gu_common.py Adds quiet kwarg to config validation and forwards it to YANG load/validate call.
generic_config_updater/patch_sorter.py Uses quiet=True for speculative validation in FullConfigMoveValidator.
tests/generic_config_updater/patch_sorter_test.py Adds regression test verifying quiet=True is passed during speculative validation.

Comment thread generic_config_updater/gu_common.py Outdated
Comment thread generic_config_updater/gu_common.py Outdated
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Apr 22, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command.

@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-gcu-forward-leafref-check branch from c32a7e4 to a3d80e6 Compare April 22, 2026 18:38
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-gcu-forward-leafref-check branch from a3d80e6 to 7a2c7c3 Compare April 22, 2026 18:50
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-gcu-forward-leafref-check branch from 7a2c7c3 to 2333554 Compare April 22, 2026 18:52
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Xichen96 and others added 3 commits April 22, 2026 19:03
FullConfigMoveValidator is invoked speculatively during the patch-sort
search to check whether a candidate move results in a YANG-valid config.
Forward leafref references produce expected transient validation
failures; the sorter already handles these as a prune signal via the
returned (False, error) tuple.

However, sonic_yang.loadData logs a LOG_ERR "Data Loading Failed" line
to syslog on every exception (log-and-throw antipattern) before
raising, so each speculative probe leaks an ERR to syslog. In practice
a single config apply-patch during the GCU test suite can emit 100+
such lines, which trips loganalyzer and makes real errors hard to
find.

This change threads a quiet= kwarg through:
  ConfigWrapper.validate_config_db_config(..., quiet=False)
      -> sonic_yang.loadData(..., quiet=quiet)

and makes FullConfigMoveValidator.validate pass quiet=True. Non-
speculative callers (e.g. final target-config validation) keep the
existing default (quiet=False) and continue to log on real errors.

Depends on sonic-net/sonic-buildimage: quiet= kwarg added to
sonic_yang_ext.SonicYangExtMixin.loadData.

Validated on DUT with sonic-mgmt generic_config_updater subset
(dhcp_relay, eth_interface, vlan_interface, portchannel_interface,
cacl, lo_interface, bgp_prefix, syslog):
  BEFORE: 31P/1F/3S/3E, 231 apply-patch,  115 Data Loading Failed ERR
  AFTER:  31P/1F/3S/2E, 231 apply-patch,   34 Data Loading Failed ERR
Remaining ERR lines are from legitimate non-speculative callers.

Signed-off-by: Xichen96 <lukelin0907@gmail.com>
- gu_common.validate_config_db_config: wrap sy.loadData call in
  try/except TypeError so callers keep working against older
  sonic-yang-mgmt wheels that do not yet ship the quiet= kwarg.
- tests/generic_config_updater/patch_sorter_test.py: add regression
  guard asserting FullConfigMoveValidator.validate invokes the config
  wrapper with quiet=True so the speculative patch-sort search does
  not spam syslog with transient leafref LOG_ERR lines.

Signed-off-by: Xichen96 <lukelin0907@gmail.com>
Per review feedback, the quiet= kwarg threading through
ConfigWrapper.validate_config_db_config adds API surface for no
real benefit. loadData's LOG_ERR "Data Loading Failed" line is
always redundant -- every caller already gets the full exception
via the returned (False, error) tuple, so the syslog line is
duplicate noise regardless of whether the call site is the
speculative patch-sort search or a non-speculative validator.

This commit:
  - Removes the quiet= kwarg from validate_config_db_config and
    pins quiet=True directly in the loadData call.
  - Reverts patch_sorter.FullConfigMoveValidator.validate to its
    original one-line form (no longer needs to pass quiet=True).
  - Updates the regression unit test to assert the wrapper is
    called without the kwarg.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xichen96 <lukelin0907@gmail.com>
@Xichen96 Xichen96 force-pushed the dev/xichenlin/fix-gcu-forward-leafref-check branch from 2333554 to 36f11d3 Compare April 22, 2026 19:04
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@Blueve Blueve merged commit e04fdbb into sonic-net:master Apr 23, 2026
9 checks passed
Xichen96 added a commit to Xichen96/sonic-utilities that referenced this pull request Apr 23, 2026
…ic-net#4482)

* GCU: silence transient YANG leafref ERR during patch-sort search

FullConfigMoveValidator is invoked speculatively during the patch-sort
search to check whether a candidate move results in a YANG-valid config.
Forward leafref references produce expected transient validation
failures; the sorter already handles these as a prune signal via the
returned (False, error) tuple.

However, sonic_yang.loadData logs a LOG_ERR "Data Loading Failed" line
to syslog on every exception (log-and-throw antipattern) before
raising, so each speculative probe leaks an ERR to syslog. In practice
a single config apply-patch during the GCU test suite can emit 100+
such lines, which trips loganalyzer and makes real errors hard to
find.

This change threads a quiet= kwarg through:
  ConfigWrapper.validate_config_db_config(..., quiet=False)
      -> sonic_yang.loadData(..., quiet=quiet)

and makes FullConfigMoveValidator.validate pass quiet=True. Non-
speculative callers (e.g. final target-config validation) keep the
existing default (quiet=False) and continue to log on real errors.

Depends on sonic-net/sonic-buildimage: quiet= kwarg added to
sonic_yang_ext.SonicYangExtMixin.loadData.

Validated on DUT with sonic-mgmt generic_config_updater subset
(dhcp_relay, eth_interface, vlan_interface, portchannel_interface,
cacl, lo_interface, bgp_prefix, syslog):
  BEFORE: 31P/1F/3S/3E, 231 apply-patch,  115 Data Loading Failed ERR
  AFTER:  31P/1F/3S/2E, 231 apply-patch,   34 Data Loading Failed ERR
Remaining ERR lines are from legitimate non-speculative callers.

Signed-off-by: Xichen96 <lukelin0907@gmail.com>

* Address review: backward-compat quiet= + unit test

- gu_common.validate_config_db_config: wrap sy.loadData call in
  try/except TypeError so callers keep working against older
  sonic-yang-mgmt wheels that do not yet ship the quiet= kwarg.
- tests/generic_config_updater/patch_sorter_test.py: add regression
  guard asserting FullConfigMoveValidator.validate invokes the config
  wrapper with quiet=True so the speculative patch-sort search does
  not spam syslog with transient leafref LOG_ERR lines.

Signed-off-by: Xichen96 <lukelin0907@gmail.com>

* Simplify: drop quiet= plumbing, hardcode quiet=True in loadData

Per review feedback, the quiet= kwarg threading through
ConfigWrapper.validate_config_db_config adds API surface for no
real benefit. loadData's LOG_ERR "Data Loading Failed" line is
always redundant -- every caller already gets the full exception
via the returned (False, error) tuple, so the syslog line is
duplicate noise regardless of whether the call site is the
speculative patch-sort search or a non-speculative validator.

This commit:
  - Removes the quiet= kwarg from validate_config_db_config and
    pins quiet=True directly in the loadData call.
  - Reverts patch_sorter.FullConfigMoveValidator.validate to its
    original one-line form (no longer needs to pass quiet=True).
  - Updates the regression unit test to assert the wrapper is
    called without the kwarg.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xichen96 <lukelin0907@gmail.com>

---------

Signed-off-by: Xichen96 <lukelin0907@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
(cherry picked from commit e04fdbb)
vmittal-msft pushed a commit that referenced this pull request Apr 28, 2026
…) (#4484)

* GCU: silence transient YANG leafref ERR during patch-sort search

FullConfigMoveValidator is invoked speculatively during the patch-sort
search to check whether a candidate move results in a YANG-valid config.
Forward leafref references produce expected transient validation
failures; the sorter already handles these as a prune signal via the
returned (False, error) tuple.

However, sonic_yang.loadData logs a LOG_ERR "Data Loading Failed" line
to syslog on every exception (log-and-throw antipattern) before
raising, so each speculative probe leaks an ERR to syslog. In practice
a single config apply-patch during the GCU test suite can emit 100+
such lines, which trips loganalyzer and makes real errors hard to
find.

This change threads a quiet= kwarg through:
  ConfigWrapper.validate_config_db_config(..., quiet=False)
      -> sonic_yang.loadData(..., quiet=quiet)

and makes FullConfigMoveValidator.validate pass quiet=True. Non-
speculative callers (e.g. final target-config validation) keep the
existing default (quiet=False) and continue to log on real errors.

Depends on sonic-net/sonic-buildimage: quiet= kwarg added to
sonic_yang_ext.SonicYangExtMixin.loadData.

Validated on DUT with sonic-mgmt generic_config_updater subset
(dhcp_relay, eth_interface, vlan_interface, portchannel_interface,
cacl, lo_interface, bgp_prefix, syslog):
  BEFORE: 31P/1F/3S/3E, 231 apply-patch,  115 Data Loading Failed ERR
  AFTER:  31P/1F/3S/2E, 231 apply-patch,   34 Data Loading Failed ERR
Remaining ERR lines are from legitimate non-speculative callers.



* Address review: backward-compat quiet= + unit test

- gu_common.validate_config_db_config: wrap sy.loadData call in
  try/except TypeError so callers keep working against older
  sonic-yang-mgmt wheels that do not yet ship the quiet= kwarg.
- tests/generic_config_updater/patch_sorter_test.py: add regression
  guard asserting FullConfigMoveValidator.validate invokes the config
  wrapper with quiet=True so the speculative patch-sort search does
  not spam syslog with transient leafref LOG_ERR lines.



* Simplify: drop quiet= plumbing, hardcode quiet=True in loadData

Per review feedback, the quiet= kwarg threading through
ConfigWrapper.validate_config_db_config adds API surface for no
real benefit. loadData's LOG_ERR "Data Loading Failed" line is
always redundant -- every caller already gets the full exception
via the returned (False, error) tuple, so the syslog line is
duplicate noise regardless of whether the call site is the
speculative patch-sort search or a non-speculative validator.

This commit:
  - Removes the quiet= kwarg from validate_config_db_config and
    pins quiet=True directly in the loadData call.
  - Reverts patch_sorter.FullConfigMoveValidator.validate to its
    original one-line form (no longer needs to pass quiet=True).
  - Updates the regression unit test to assert the wrapper is
    called without the kwarg.




---------



(cherry picked from commit e04fdbb)

Signed-off-by: Xichen96 <lukelin0907@gmail.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

The change is not in 202511 yet. @Xichen96, please manually create the cherry pick PR for branch 202511.
You can ping the release branch owner(github account: ) to approve your cherry pick PR.
If this change is already in 202511, please comment "already in 202511". Thanks!

---Powered by SONiC BuildBot

@Xichen96
Copy link
Copy Markdown
Contributor Author

already in 202511

@Xichen96
Copy link
Copy Markdown
Contributor Author

#4484

@mssonicbld
Copy link
Copy Markdown
Collaborator

The cherry pick conflict has been handled manually. Removing cherry pick conflict label...

---Powered by SONiC BuildBot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants