fix: StaticCovariatesTransformer IndexError with OneHotEncoder drop param#3065
Open
JKDasondee wants to merge 1 commit intounit8co:masterfrom
Open
fix: StaticCovariatesTransformer IndexError with OneHotEncoder drop param#3065JKDasondee wants to merge 1 commit intounit8co:masterfrom
JKDasondee wants to merge 1 commit intounit8co:masterfrom
Conversation
…aram _create_category_mappings() iterated over transformer_cat.categories_ to build the column name mapping, but categories_ includes all categories regardless of the encoder's drop setting. When drop='first' or drop='if_binary', the transformed output has fewer columns than categories, causing an IndexError in _add_back_static_covs() when indexing into the narrower transformed array. Use get_feature_names_out() instead, which returns only the actual output column names and correctly reflects any dropped categories. Adds parametrized test for drop='first' and drop='if_binary' covering both single-series and multi-series transform + inverse_transform round-trips. Fixes unit8co#2880
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3065 +/- ##
==========================================
- Coverage 95.80% 95.73% -0.07%
==========================================
Files 158 158
Lines 17315 17320 +5
==========================================
- Hits 16588 16582 -6
- Misses 727 738 +11 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2880.
StaticCovariatesTransformerraisesIndexErrorwhen usingOneHotEncoderwithdrop='first'(ordrop='if_binary', etc.). The transformed output has fewer columns than categories, but the column mapping was built fromtransformer_cat.categories_which includes all categories regardless of thedropsetting.Root cause
In
_create_category_mappings(), the column name mapping iterates overtransformer_cat.categories_:With
drop='first'and 3 categories per column,categories_has 3 entries but the encoder only outputs 2 columns. Later in_add_back_static_covs(), the mapping tries to index position 4 in an array of size 4 (0-indexed), causing theIndexError.Fix
Use
get_feature_names_out()instead of iterating overcategories_. This method returns only the actual output column names, correctly reflecting any dropped categories.Tests
Adds
test_one_hot_encoder_with_dropparametrized on['first', 'if_binary']. Each variant tests:fit_transform+inverse_transformround-tripfit_transform+inverse_transformround-tripstatic_covariatesDataFrames match exactly after round-tripTDD verification
Without the fix, the new test fails with:
at
static_covariates_transformer.py:456in_add_back_static_covs().With the fix, all 10 tests pass (8 existing + 2 new):