[ENH] Add optimized D1 layer categorical encoder for v2 by Siddhazntx · Pull Request #2211 · sktime/pytorch-forecasting

Siddhazntx · 2026-03-18T11:01:21Z

Reference Issues/PRs

Addresses the label_encoders task mentioned in the v2 roadmap tracking issue : #1974

What does this implement/fix? Explain your changes.

This PR introduces an optimized D1CategoricalEncoder to the v2 data pipeline to handle categorical and text variables, preventing PyTorch tensor conversion crashes.

Key Changes:

New Encoder Class: Created _encoders_v2.py featuring a D1CategoricalEncoder that strictly follows the scikit-learn API (fit, transform, inverse_transform).
C-Level Optimization: Utilized pd.factorize() instead of native Python dictionaries to ensure the encoding process is efficient and scalable for large datasets.
Robust Edge-Case Handling: Safely manages original NaN values without silently dropping them.
- Handles unseen variables during the transform phase (defaulting to 0).
- Implemented a _warned_cols set to ensure warnings for unseen variables only trigger once per column, preventing terminal flooding during dataloader loops.
D1 Layer Integration: Integrated the encoder into the __init__ of TimeSeries inside _timeseries_v2.py. Columns specified in the cat argument are now automatically encoded.

What should a reviewer concentrate their feedback on?

Integration Point: Please review the placement of the encoding logic within _timeseries_v2.py's __init__ method to ensure it aligns with the intended v2 data ingestion flow.
Unseen Variable Strategy: I defaulted to handle_unknown="assign_new" (mapping to 0). Let me know if the core team prefers a different default behavior for the v2 release!

Did you add any tests for the change?

Yes. I added a comprehensive pytest suite in a new test_encoders_v2.py file. Tests include:

test_encoder_fit_transform: Validates integer conversion and preservation of numeric columns using pd.api.types.is_integer_dtype.
test_encoder_inverse_transform: Ensures perfect reverse translation, including restoring true NaN values.
test_unseen_variables_warning: Confirms correct fallback assignment and verifies the custom warning triggers.
test_only_categorical_columns_selected: Ensures the auto-detect feature properly ignores numeric columns when columns=None.

Any other comments?

PR checklist

The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
Added/modified tests
Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
To run hooks independent of commit, execute pre-commit run --all-files

…overage

codecov · 2026-03-18T13:01:45Z

Codecov Report

❌ Patch coverage is 94.91525% with 3 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@edbdeb4). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
pytorch_forecasting/data/_encoders_v2.py	96.07%	2 Missing ⚠️
...orch_forecasting/data/timeseries/_timeseries_v2.py	87.50%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2211   +/-   ##
=======================================
  Coverage        ?   86.67%           
=======================================
  Files           ?      166           
  Lines           ?     9795           
  Branches        ?        0           
=======================================
  Hits            ?     8490           
  Misses          ?     1305           
  Partials        ?        0

Flag	Coverage Δ
cpu	`86.67% <94.91%> (?)`
pytest	`86.67% <94.91%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Siddhazntx added 2 commits March 18, 2026 15:59

feat: add optimized D1 layer categorical encoder for v2

a799e54

test: refine pytest suite with robust type checking and auto-detect c…

cfaa032

…overage

Siddhazntx requested review from PranavBhatP, benHeid, fkiraly, fnhirwa, jdb78, phoeenniixx and yarnabrina as code owners March 18, 2026 11:01

test: add edge-case and error-handling coverage for D1CategoricalEncoder

dd53e9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH] Add optimized D1 layer categorical encoder for v2#2211

[ENH] Add optimized D1 layer categorical encoder for v2#2211
Siddhazntx wants to merge 3 commits intosktime:mainfrom
Siddhazntx:feature/v2-label-encoder

Siddhazntx commented Mar 18, 2026

Uh oh!

codecov bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Siddhazntx commented Mar 18, 2026

Reference Issues/PRs

What does this implement/fix? Explain your changes.

What should a reviewer concentrate their feedback on?

Did you add any tests for the change?

Any other comments?

PR checklist

Uh oh!

codecov bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Mar 18, 2026 •

edited

Loading