GCU: silence transient YANG leafref ERR during patch-sort search#4482
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR reduces syslog noise emitted during Generic Config Updater (GCU) patch-sort’s speculative YANG validation by threading a quiet= flag into the validation path used for search/pruning, while preserving existing logging behavior for non-speculative (final) validation.
Changes:
- Add
quiet: bool = Falsekwarg toConfigWrapper.validate_config_db_config()and forward it tosonic_yang.loadData(...). - Update
FullConfigMoveValidator.validate()to callvalidate_config_db_config(..., quiet=True)during patch-sort search. - Add a unit test asserting
FullConfigMoveValidatorpassesquiet=Trueto the config wrapper.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
generic_config_updater/gu_common.py |
Adds quiet kwarg to config validation and forwards it to YANG load/validate call. |
generic_config_updater/patch_sorter.py |
Uses quiet=True for speculative validation in FullConfigMoveValidator. |
tests/generic_config_updater/patch_sorter_test.py |
Adds regression test verifying quiet=True is passed during speculative validation. |
|
/azp run |
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
c32a7e4 to
a3d80e6
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
a3d80e6 to
7a2c7c3
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
7a2c7c3 to
2333554
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
FullConfigMoveValidator is invoked speculatively during the patch-sort
search to check whether a candidate move results in a YANG-valid config.
Forward leafref references produce expected transient validation
failures; the sorter already handles these as a prune signal via the
returned (False, error) tuple.
However, sonic_yang.loadData logs a LOG_ERR "Data Loading Failed" line
to syslog on every exception (log-and-throw antipattern) before
raising, so each speculative probe leaks an ERR to syslog. In practice
a single config apply-patch during the GCU test suite can emit 100+
such lines, which trips loganalyzer and makes real errors hard to
find.
This change threads a quiet= kwarg through:
ConfigWrapper.validate_config_db_config(..., quiet=False)
-> sonic_yang.loadData(..., quiet=quiet)
and makes FullConfigMoveValidator.validate pass quiet=True. Non-
speculative callers (e.g. final target-config validation) keep the
existing default (quiet=False) and continue to log on real errors.
Depends on sonic-net/sonic-buildimage: quiet= kwarg added to
sonic_yang_ext.SonicYangExtMixin.loadData.
Validated on DUT with sonic-mgmt generic_config_updater subset
(dhcp_relay, eth_interface, vlan_interface, portchannel_interface,
cacl, lo_interface, bgp_prefix, syslog):
BEFORE: 31P/1F/3S/3E, 231 apply-patch, 115 Data Loading Failed ERR
AFTER: 31P/1F/3S/2E, 231 apply-patch, 34 Data Loading Failed ERR
Remaining ERR lines are from legitimate non-speculative callers.
Signed-off-by: Xichen96 <lukelin0907@gmail.com>
- gu_common.validate_config_db_config: wrap sy.loadData call in try/except TypeError so callers keep working against older sonic-yang-mgmt wheels that do not yet ship the quiet= kwarg. - tests/generic_config_updater/patch_sorter_test.py: add regression guard asserting FullConfigMoveValidator.validate invokes the config wrapper with quiet=True so the speculative patch-sort search does not spam syslog with transient leafref LOG_ERR lines. Signed-off-by: Xichen96 <lukelin0907@gmail.com>
Per review feedback, the quiet= kwarg threading through
ConfigWrapper.validate_config_db_config adds API surface for no
real benefit. loadData's LOG_ERR "Data Loading Failed" line is
always redundant -- every caller already gets the full exception
via the returned (False, error) tuple, so the syslog line is
duplicate noise regardless of whether the call site is the
speculative patch-sort search or a non-speculative validator.
This commit:
- Removes the quiet= kwarg from validate_config_db_config and
pins quiet=True directly in the loadData call.
- Reverts patch_sorter.FullConfigMoveValidator.validate to its
original one-line form (no longer needs to pass quiet=True).
- Updates the regression unit test to assert the wrapper is
called without the kwarg.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Xichen96 <lukelin0907@gmail.com>
2333554 to
36f11d3
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…ic-net#4482) * GCU: silence transient YANG leafref ERR during patch-sort search FullConfigMoveValidator is invoked speculatively during the patch-sort search to check whether a candidate move results in a YANG-valid config. Forward leafref references produce expected transient validation failures; the sorter already handles these as a prune signal via the returned (False, error) tuple. However, sonic_yang.loadData logs a LOG_ERR "Data Loading Failed" line to syslog on every exception (log-and-throw antipattern) before raising, so each speculative probe leaks an ERR to syslog. In practice a single config apply-patch during the GCU test suite can emit 100+ such lines, which trips loganalyzer and makes real errors hard to find. This change threads a quiet= kwarg through: ConfigWrapper.validate_config_db_config(..., quiet=False) -> sonic_yang.loadData(..., quiet=quiet) and makes FullConfigMoveValidator.validate pass quiet=True. Non- speculative callers (e.g. final target-config validation) keep the existing default (quiet=False) and continue to log on real errors. Depends on sonic-net/sonic-buildimage: quiet= kwarg added to sonic_yang_ext.SonicYangExtMixin.loadData. Validated on DUT with sonic-mgmt generic_config_updater subset (dhcp_relay, eth_interface, vlan_interface, portchannel_interface, cacl, lo_interface, bgp_prefix, syslog): BEFORE: 31P/1F/3S/3E, 231 apply-patch, 115 Data Loading Failed ERR AFTER: 31P/1F/3S/2E, 231 apply-patch, 34 Data Loading Failed ERR Remaining ERR lines are from legitimate non-speculative callers. Signed-off-by: Xichen96 <lukelin0907@gmail.com> * Address review: backward-compat quiet= + unit test - gu_common.validate_config_db_config: wrap sy.loadData call in try/except TypeError so callers keep working against older sonic-yang-mgmt wheels that do not yet ship the quiet= kwarg. - tests/generic_config_updater/patch_sorter_test.py: add regression guard asserting FullConfigMoveValidator.validate invokes the config wrapper with quiet=True so the speculative patch-sort search does not spam syslog with transient leafref LOG_ERR lines. Signed-off-by: Xichen96 <lukelin0907@gmail.com> * Simplify: drop quiet= plumbing, hardcode quiet=True in loadData Per review feedback, the quiet= kwarg threading through ConfigWrapper.validate_config_db_config adds API surface for no real benefit. loadData's LOG_ERR "Data Loading Failed" line is always redundant -- every caller already gets the full exception via the returned (False, error) tuple, so the syslog line is duplicate noise regardless of whether the call site is the speculative patch-sort search or a non-speculative validator. This commit: - Removes the quiet= kwarg from validate_config_db_config and pins quiet=True directly in the loadData call. - Reverts patch_sorter.FullConfigMoveValidator.validate to its original one-line form (no longer needs to pass quiet=True). - Updates the regression unit test to assert the wrapper is called without the kwarg. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Xichen96 <lukelin0907@gmail.com> --------- Signed-off-by: Xichen96 <lukelin0907@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> (cherry picked from commit e04fdbb)
…) (#4484) * GCU: silence transient YANG leafref ERR during patch-sort search FullConfigMoveValidator is invoked speculatively during the patch-sort search to check whether a candidate move results in a YANG-valid config. Forward leafref references produce expected transient validation failures; the sorter already handles these as a prune signal via the returned (False, error) tuple. However, sonic_yang.loadData logs a LOG_ERR "Data Loading Failed" line to syslog on every exception (log-and-throw antipattern) before raising, so each speculative probe leaks an ERR to syslog. In practice a single config apply-patch during the GCU test suite can emit 100+ such lines, which trips loganalyzer and makes real errors hard to find. This change threads a quiet= kwarg through: ConfigWrapper.validate_config_db_config(..., quiet=False) -> sonic_yang.loadData(..., quiet=quiet) and makes FullConfigMoveValidator.validate pass quiet=True. Non- speculative callers (e.g. final target-config validation) keep the existing default (quiet=False) and continue to log on real errors. Depends on sonic-net/sonic-buildimage: quiet= kwarg added to sonic_yang_ext.SonicYangExtMixin.loadData. Validated on DUT with sonic-mgmt generic_config_updater subset (dhcp_relay, eth_interface, vlan_interface, portchannel_interface, cacl, lo_interface, bgp_prefix, syslog): BEFORE: 31P/1F/3S/3E, 231 apply-patch, 115 Data Loading Failed ERR AFTER: 31P/1F/3S/2E, 231 apply-patch, 34 Data Loading Failed ERR Remaining ERR lines are from legitimate non-speculative callers. * Address review: backward-compat quiet= + unit test - gu_common.validate_config_db_config: wrap sy.loadData call in try/except TypeError so callers keep working against older sonic-yang-mgmt wheels that do not yet ship the quiet= kwarg. - tests/generic_config_updater/patch_sorter_test.py: add regression guard asserting FullConfigMoveValidator.validate invokes the config wrapper with quiet=True so the speculative patch-sort search does not spam syslog with transient leafref LOG_ERR lines. * Simplify: drop quiet= plumbing, hardcode quiet=True in loadData Per review feedback, the quiet= kwarg threading through ConfigWrapper.validate_config_db_config adds API surface for no real benefit. loadData's LOG_ERR "Data Loading Failed" line is always redundant -- every caller already gets the full exception via the returned (False, error) tuple, so the syslog line is duplicate noise regardless of whether the call site is the speculative patch-sort search or a non-speculative validator. This commit: - Removes the quiet= kwarg from validate_config_db_config and pins quiet=True directly in the loadData call. - Reverts patch_sorter.FullConfigMoveValidator.validate to its original one-line form (no longer needs to pass quiet=True). - Updates the regression unit test to assert the wrapper is called without the kwarg. --------- (cherry picked from commit e04fdbb) Signed-off-by: Xichen96 <lukelin0907@gmail.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
The change is not in 202511 yet. @Xichen96, please manually create the cherry pick PR for branch 202511. ---Powered by SONiC BuildBot
|
|
already in 202511 |
|
The cherry pick conflict has been handled manually. Removing cherry pick conflict label... ---Powered by SONiC BuildBot
|
FullConfigMoveValidator is invoked speculatively during the patch-sort search to check whether a candidate move results in a YANG-valid config. Forward leafref references produce expected transient validation failures; the sorter already handles these as a prune signal via the returned (False, error) tuple.
However, sonic_yang.loadData logs a LOG_ERR "Data Loading Failed" line to syslog on every exception (log-and-throw antipattern) before raising, so each speculative probe leaks an ERR to syslog. In practice a single config apply-patch during the GCU test suite can emit 100+ such lines, which trips loganalyzer and makes real errors hard to find.
This change threads a quiet= kwarg through:
ConfigWrapper.validate_config_db_config(..., quiet=False)
-> sonic_yang.loadData(..., quiet=quiet)
and makes FullConfigMoveValidator.validate pass quiet=True. Non- speculative callers (e.g. final target-config validation) keep the existing default (quiet=False) and continue to log on real errors.
Depends on sonic-net/sonic-buildimage: quiet= kwarg added to sonic_yang_ext.SonicYangExtMixin.loadData.
Validated on DUT with sonic-mgmt generic_config_updater subset (dhcp_relay, eth_interface, vlan_interface, portchannel_interface, cacl, lo_interface, bgp_prefix, syslog):
BEFORE: 31P/1F/3S/3E, 231 apply-patch, 115 Data Loading Failed ERR
AFTER: 31P/1F/3S/2E, 231 apply-patch, 34 Data Loading Failed ERR
Remaining ERR lines are from legitimate non-speculative callers.
What I did
Silenced the
Data Loading FailedLOG_ERR spam that the GCU patch sorter emits during its speculative search, without altering the sorter's control flow or changing behavior for any non-speculative caller.Two files in
generic_config_updater/:gu_common.py—ConfigWrapper.validate_config_db_configgains aquiet=Falsekwarg, forwarded tosy.loadData(..., quiet=quiet).patch_sorter.py—FullConfigMoveValidator.validatepassesquiet=True, since every failure there is an expected search-prune signal.No other caller is changed.
FullConfigMoveValidator's return semantics (is_valid, error) are unchanged, so real failures still surface through the returned tuple as before.Work item tracking (Microsoft ADO): 36655183 (
[SONiC_Baseline][Failed_Case][dhcp_relay.test_dhcpv4_relay][test_dhcp_relay_with_non_default_vrf]) and 36656583 ([SONiC_Baseline][Failed_Case][dhcp_relay.test_dhcpv4_relay][test_dhcp_relay_with_different_non_default_vrf]).Depends on sonic-net/sonic-buildimage# (adds the
quiet=kwarg toSonicYangExtMixin.loadDatainsonic-yang-mgmt). Must merge after that PR.How I did it
How to verify it
Run the sonic-mgmt
generic_config_updatertest subset against a T0 DUT and count"Data Loading Failed"lines in/var/log/syslogduring the run.Validated on a T0 (20251110.19) with 8 test files (
dhcp_relay,eth_interface,vlan_interface,portchannel_interface,cacl,lo_interface,bgp_prefix,syslog):BEFORE (stock):
apply-patchevents during the runData Loading Failedlines in syslogAFTER (fix):
apply-patchevents during the runData Loading Failedlines in syslogThe remaining 34 ERR lines are from legitimate non-speculative callers (final target-config validation) — preserved behavior, not a bug.
Also ran
dhcp_server/test_dhcp_server_port_based_customize_optionswith the fix: 2/2 passed, 8apply-patchevents, 0Data Loading FailedERR, 0 leakedsonic_yangERR.Default behavior (
quiet=False) is exercised by every existing GCU unit test; no unit tests needed modification.Previous command output (if the output of a command-line utility has changed)
New command output (if the output of a command-line utility has changed)