Skip to content

fix: StaticCovariatesTransformer IndexError with OneHotEncoder drop param#3065

Open
JKDasondee wants to merge 1 commit intounit8co:masterfrom
JKDasondee:fix/2880-onehot-encoder-drop
Open

fix: StaticCovariatesTransformer IndexError with OneHotEncoder drop param#3065
JKDasondee wants to merge 1 commit intounit8co:masterfrom
JKDasondee:fix/2880-onehot-encoder-drop

Conversation

@JKDasondee
Copy link
Copy Markdown

Summary

Fixes #2880.

StaticCovariatesTransformer raises IndexError when using OneHotEncoder with drop='first' (or drop='if_binary', etc.). The transformed output has fewer columns than categories, but the column mapping was built from transformer_cat.categories_ which includes all categories regardless of the drop setting.

Root cause

In _create_category_mappings(), the column name mapping iterates over transformer_cat.categories_:

for col, categories in zip(cols_cat, transformer_cat.categories_):
    for cat in categories:
        col_map_cat_i.append(str(col) + "_" + str(cat))

With drop='first' and 3 categories per column, categories_ has 3 entries but the encoder only outputs 2 columns. Later in _add_back_static_covs(), the mapping tries to index position 4 in an array of size 4 (0-indexed), causing the IndexError.

Fix

Use get_feature_names_out() instead of iterating over categories_. This method returns only the actual output column names, correctly reflecting any dropped categories.

feature_names = transformer_cat.get_feature_names_out(cols_cat)
feat_idx = 0
for col in cols_cat:
    col_map_cat_i = []
    prefix = str(col) + "_"
    while feat_idx < len(feature_names) and feature_names[feat_idx].startswith(prefix):
        name = feature_names[feat_idx]
        col_map_cat_i.append(name)
        inv_col_map_cat[name] = [col]
        feat_idx += 1
    col_map_cat[col] = col_map_cat_i

Tests

Adds test_one_hot_encoder_with_drop parametrized on ['first', 'if_binary']. Each variant tests:

  • Single-series fit_transform + inverse_transform round-trip
  • Multi-series fit_transform + inverse_transform round-trip
  • Asserts static_covariates DataFrames match exactly after round-trip

TDD verification

Without the fix, the new test fails with:

IndexError: index 4 is out of bounds for axis 1 with size 4

at static_covariates_transformer.py:456 in _add_back_static_covs().

With the fix, all 10 tests pass (8 existing + 2 new):

$ pytest darts/tests/dataprocessing/transformers/test_static_covariates_transformer.py -v
10 passed in 0.20s

…aram

_create_category_mappings() iterated over transformer_cat.categories_
to build the column name mapping, but categories_ includes all
categories regardless of the encoder's drop setting. When drop='first'
or drop='if_binary', the transformed output has fewer columns than
categories, causing an IndexError in _add_back_static_covs() when
indexing into the narrower transformed array.

Use get_feature_names_out() instead, which returns only the actual
output column names and correctly reflects any dropped categories.

Adds parametrized test for drop='first' and drop='if_binary' covering
both single-series and multi-series transform + inverse_transform
round-trips.

Fixes unit8co#2880
@JKDasondee JKDasondee requested a review from dennisbader as a code owner April 10, 2026 23:06
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.73%. Comparing base (447c7d3) to head (19571c3).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3065      +/-   ##
==========================================
- Coverage   95.80%   95.73%   -0.07%     
==========================================
  Files         158      158              
  Lines       17315    17320       +5     
==========================================
- Hits        16588    16582       -6     
- Misses        727      738      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] IndexError when transforming static covariates with OneHotEncoder and parameter drop set

1 participant