Skip to content

[Claude + Human] Add remove_nan_targets option to drop NaN-target recordings#382

Merged
bruAristimunha merged 5 commits into
eegdash:developfrom
duanyiqun:claude/issue-22-remove-nan-targets
Jun 23, 2026
Merged

[Claude + Human] Add remove_nan_targets option to drop NaN-target recordings#382
bruAristimunha merged 5 commits into
eegdash:developfrom
duanyiqun:claude/issue-22-remove-nan-targets

Conversation

@duanyiqun

Copy link
Copy Markdown
Contributor

Summary

  • Add a remove_nan_targets parameter to EEGDashDataset. When target_name is set and remove_nan_targets=True, recordings whose target value is missing (None/NaN) are dropped at construction, with a warning reporting the count.

Design note: default is False

  • Issue Remove NaN at import time for the target variable #22 / @arnodelorme suggested defaulting this on. We deliberately default to False instead, because dropping recordings by default would silently change len(dataset) for existing users and is a backward-incompatible behaviour change. Opt in with remove_nan_targets=True.
  • Regardless of this flag, the all-targets-missing case still raises a ValueError (handled by the target_name validation in Check var when creating dataset #21), which matches the "error if all the datasets are NaN" request.

Dependency

Impact

  • Opt-in convenience for target-based training so users don't have to post-filter NaN-target recordings manually. Default behaviour is unchanged.

Validation

  • ruff check + ruff format --check clean on changed files
  • pytest tests/unit_tests/dataset/test_remove_nan_targets.py tests/unit_tests/dataset/test_target_name_validation.py → 7 passed

Closes #22.

🤖 Generated with Claude Code

duanyiqun and others added 3 commits June 18, 2026 12:55
target_name is forwarded to braindecode, but if the column is absent the
failure only surfaces later (or silently). EEGDashDataset now auto-adds
target_name to description_fields and raises a clear ValueError, listing the
available fields, when the target is missing for every recording — e.g.
target_name="p-factor" instead of "p_factor".

Closes eegdash#21.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ordings

When target_name is set, recordings whose target value is missing (None/NaN)
can now be dropped at construction via remove_nan_targets=True, emitting a
warning with the count. The default is False so existing behaviour is
unchanged (such recordings are kept); the all-targets-missing case is still a
ValueError regardless of the flag (see eegdash#21).

Stacked on the eegdash#21 branch (target_name validation). Closes eegdash#22.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bruAristimunha

Copy link
Copy Markdown
Collaborator

I maybe have the impression that this issue was fix in the pass, can you create an minimal code that reproduce the current code source?

@duanyiqun

Copy link
Copy Markdown
Contributor Author

@bruAristimunha sure on my dev branch seems not let me double check with code on that.

@duanyiqun

Copy link
Copy Markdown
Contributor Author

@bruAristimunha I double-checked on current develop — it's not handled yet. Minimal repro (offline, no network), three recordings where the target age is present / None / NaN:

import tempfile
from eegdash import EEGDashDataset

tmp = tempfile.mkdtemp()
def rec(sub, age):
    r = {
        "dataset": "dsTest",
        "bids_relpath": f"sub-{sub}/eeg/sub-{sub}_task-rest_eeg.set",
        "bidspath": f"dsTest/sub-{sub}/eeg/sub-{sub}_task-rest_eeg.set",
        "storage": {"backend": "local", "base": tmp},
        "entities_mne": {"subject": sub, "task": "rest"},
        "ntimes": 100, "sampling_frequency": 100,
    }
    if age is not None:
        r["age"] = age
    return r

records = [rec("01", 25), rec("02", None), rec("03", float("nan"))]
ds = EEGDashDataset(cache_dir=tmp, dataset="dsTest", records=records,
                    download=False, target_name="age")

print(len(ds.datasets))                                   # -> 3
print([d.description.get("age") for d in ds.datasets])    # -> [25, nan, nan]

Output on develop:

3
[25, nan, nan]

So the NaN/None-target recordings are silently kept, and there is no switch to drop them — EEGDashDataset(..., remove_nan_targets=True) raises TypeError: RawDataset.__init__() got an unexpected keyword argument 'remove_nan_targets' because the kwarg is forwarded to braindecode.

This PR adds that opt-in switch (default False to stay backward-compatible). The "all targets NaN" case is the hard error, handled by the target_name validation in #381 that this branch is stacked on.

@bruAristimunha bruAristimunha merged commit 9fb3fcb into eegdash:develop Jun 23, 2026
@bruAristimunha

Copy link
Copy Markdown
Collaborator

thanks for the PRs @duanyiqun, if you want talk and structure the efforts more, please ping at linkedin :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove NaN at import time for the target variable

2 participants