[BUG] remove erroneous .detach() in TiDE multivariate prediction path#2208
[BUG] remove erroneous .detach() in TiDE multivariate prediction path#2208haoyu-haoyu wants to merge 1 commit intosktime:mainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2208 +/- ##
=======================================
Coverage ? 86.62%
=======================================
Files ? 165
Lines ? 9736
Branches ? 0
=======================================
Hits ? 8434
Misses ? 1302
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
phoeenniixx
left a comment
There was a problem hiding this comment.
Thanks a lot for catching this!
Although should there be .clone at all? It takes up more memory, no?
What do you think? @haoyu-haoyu
Also FYI @PranavBhatP @fkiraly @agobbifbk
|
Thanks for the approval @phoeenniixx! Good question about Looking at the code flow:
Whether My recommendation: keep That said, if you'd prefer to remove it for memory efficiency, the minimal change would be: prediction = [i.requires_grad_(True) for i in prediction]This keeps the views (no copy) and just enables gradient tracking. Happy to update either way! |
|
After tracing the full call chain more carefully, I think you're right —
The simplest correct code is just: prediction = list(prediction)Want me to update the PR to remove both |
In the multivariate target code path, prediction tensors are cloned with .detach().requires_grad_(True). The .detach() call disconnects the tensor from the computational graph, creating an isolated leaf variable. The subsequent .requires_grad_(True) enables gradient tracking on this leaf but does NOT reconnect it to the encoder/decoder graph — so gradients cannot flow back through the prediction during backpropagation, effectively preventing the model from learning for multivariate targets. Fix: remove .detach() so .clone().requires_grad_(True) preserves the gradient connection while still creating a copy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
03ed98d to
137cb5a
Compare
|
Updated — simplified to The
One line, zero overhead. |
Summary
In the TiDE model's multivariate target code path (line 333), prediction tensors are processed with:
This is incorrect because:
.detach()disconnects the tensor from the computational graph.requires_grad_(True)on a detached tensor creates a new leaf — it does NOT reconnect to the encoder/decoder graphFix
.clone()creates a copy (preserving gradient connection), and.requires_grad_(True)ensures gradient tracking — without the graph-breaking.detach().Test plan
pytest tests/ -k tidepassesloss.backward()propagates to encoder parameters