Conversation
|
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #12525 +/- ##
==========================================
- Coverage 91.45% 91.44% -0.02%
==========================================
Files 203 203
Lines 25471 25497 +26
==========================================
+ Hits 23294 23315 +21
- Misses 2177 2182 +5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
MichelleArk
left a comment
There was a problem hiding this comment.
Would also be great to include some functional tests for this change.
Example microbatch functional tests testing retry behavior: https://github.com/dbt-labs/dbt-mantle/blob/main/tests/functional/microbatch/test_microbatch.py#L651-L681
Would be great to get a simple retry test for the build command + a microbatch model given that was introduced in this PR as well!
3f9f4d8 to
392e6c5
Compare
* fix dbt retry for microbatch models * addeed changelog * added functional tests (cherry picked from commit e6da05f)
* fix dbt retry for microbatch models * addeed changelog * added functional tests (cherry picked from commit e6da05f)
Resolves #11423
Problem
When a microbatch model fails during
dbt runanddbt retryis executed later (e.g., days after), the retry uses the current date instead of the original failure date to compute which batches to run. This results in processing completely different batches than those that originally failed.Root cause: In
retry.py, thebatch_mapis only populated when a microbatch model had both successful and failed batches (len(result.batch_results.successful) != 0). When all batches fail (or the model was entirely skipped due to an upstream failure), the model is excluded frombatch_map, soprevious_batch_resultsis never set on the node. This causesget_batches()inrun.pyto fall through to normal batch computation, which usesget_invocation_started_at()— the current retry time, not the original run time.Example from the issue: A model failed on 2025-03-21 with batches 03-18 through 03-21. When retried on 2025-03-25, it processed batches 03-22 through 03-25 instead — completely missing the original failed batches.
Solution
Track the original invocation time from the
run_results.jsonartifact metadata and use it during batch computation on retry.Changes:
core/dbt/task/retry.py: Extractinvocation_started_atfrom the previous run's metadata and pass it to theRunTaskasoriginal_invocation_started_at. Also changedself.task_class == RunTasktoissubclass(self.task_class, RunTask)so thatBuildTask(which extendsRunTask) also gets microbatch retry behavior.core/dbt/task/run.py: Addedoriginal_invocation_started_atattribute toRunTask. UpdatedMicrobatchModelRunner.get_microbatch_builder()to useself.parent_task.original_invocation_started_atasdefault_end_timewhen available, falling back toget_invocation_started_at()for normal (non-retry) runs.tests/unit/task/test_run.py: Added two tests verifying that the microbatch builder uses the original invocation time during retry and falls back to the current time during normal runs.Checklist