Skip to content

[BUG] remove erroneous .detach() in TiDE multivariate prediction path#2208

Open
haoyu-haoyu wants to merge 1 commit intosktime:mainfrom
haoyu-haoyu:fix/tide-gradient-detach
Open

[BUG] remove erroneous .detach() in TiDE multivariate prediction path#2208
haoyu-haoyu wants to merge 1 commit intosktime:mainfrom
haoyu-haoyu:fix/tide-gradient-detach

Conversation

@haoyu-haoyu
Copy link

Summary

In the TiDE model's multivariate target code path (line 333), prediction tensors are processed with:

prediction = [i.clone().detach().requires_grad_(True) for i in prediction]

This is incorrect because:

  1. .detach() disconnects the tensor from the computational graph
  2. .requires_grad_(True) on a detached tensor creates a new leaf — it does NOT reconnect to the encoder/decoder graph
  3. Gradients cannot flow back through the prediction during backpropagation
  4. The model effectively cannot learn for multivariate targets

Fix

- prediction = [i.clone().detach().requires_grad_(True) for i in prediction]
+ prediction = [i.clone().requires_grad_(True) for i in prediction]

.clone() creates a copy (preserving gradient connection), and .requires_grad_(True) ensures gradient tracking — without the graph-breaking .detach().

Test plan

  • TiDE model training converges for multivariate targets
  • pytest tests/ -k tide passes
  • Gradient flow verified: loss.backward() propagates to encoder parameters

@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@edbdeb4). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2208   +/-   ##
=======================================
  Coverage        ?   86.62%           
=======================================
  Files           ?      165           
  Lines           ?     9736           
  Branches        ?        0           
=======================================
  Hits            ?     8434           
  Misses          ?     1302           
  Partials        ?        0           
Flag Coverage Δ
cpu 86.62% <100.00%> (?)
pytest 86.62% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@phoeenniixx phoeenniixx added the bug Something isn't working label Mar 19, 2026
@phoeenniixx phoeenniixx changed the title fix: remove erroneous .detach() in TiDE multivariate prediction path [BUG] remove erroneous .detach() in TiDE multivariate prediction path Mar 19, 2026
phoeenniixx
phoeenniixx previously approved these changes Mar 19, 2026
Copy link
Member

@phoeenniixx phoeenniixx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for catching this!
Although should there be .clone at all? It takes up more memory, no?
What do you think? @haoyu-haoyu

Also FYI @PranavBhatP @fkiraly @agobbifbk

@haoyu-haoyu
Copy link
Author

Thanks for the approval @phoeenniixx! Good question about .clone().

Looking at the code flow:

  • prediction.permute(2, 0, 1) creates a view (not a copy)
  • for i in prediction yields slices that are also views into the same memory
  • Without .clone(), these views share the underlying storage

Whether .clone() is needed depends on whether transform_output() does any in-place operations on the prediction tensors. If it does, modifying one slice could corrupt another since they share memory.

My recommendation: keep .clone() for safety since we can't easily guarantee no in-place ops downstream. The memory overhead is minimal (one extra copy of the prediction tensor), and it prevents subtle bugs if the code is later modified.

That said, if you'd prefer to remove it for memory efficiency, the minimal change would be:

prediction = [i.requires_grad_(True) for i in prediction]

This keeps the views (no copy) and just enables gradient tracking. Happy to update either way!

@haoyu-haoyu
Copy link
Author

After tracing the full call chain more carefully, I think you're right — .clone() is NOT needed here:

  1. transform_output()loss.rescale_parameters()encoder() all create new tensors (de-normalization is not in-place), so views are safe
  2. .requires_grad_(True) is also redundant — the views already inherit requires_grad=True from the autograd graph through permute()

The simplest correct code is just:

prediction = list(prediction)

Want me to update the PR to remove both .clone() and .requires_grad_(True)?

In the multivariate target code path, prediction tensors are cloned
with .detach().requires_grad_(True).  The .detach() call disconnects
the tensor from the computational graph, creating an isolated leaf
variable.  The subsequent .requires_grad_(True) enables gradient
tracking on this leaf but does NOT reconnect it to the encoder/decoder
graph — so gradients cannot flow back through the prediction during
backpropagation, effectively preventing the model from learning for
multivariate targets.

Fix: remove .detach() so .clone().requires_grad_(True) preserves the
gradient connection while still creating a copy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@haoyu-haoyu
Copy link
Author

Updated — simplified to prediction = list(prediction) per your suggestion.

The .clone() and .requires_grad_(True) were both unnecessary:

  • Views from permute() already share the autograd graph
  • transform_output()rescale_parameters() creates new tensors (no in-place ops)
  • Gradient tracking is inherited from the original model output

One line, zero overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

Development

Successfully merging this pull request may close these issues.

2 participants