[Claude + Human] Add remove_nan_targets option to drop NaN-target recordings#382
Conversation
target_name is forwarded to braindecode, but if the column is absent the failure only surfaces later (or silently). EEGDashDataset now auto-adds target_name to description_fields and raises a clear ValueError, listing the available fields, when the target is missing for every recording — e.g. target_name="p-factor" instead of "p_factor". Closes eegdash#21. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ordings When target_name is set, recordings whose target value is missing (None/NaN) can now be dropped at construction via remove_nan_targets=True, emitting a warning with the count. The default is False so existing behaviour is unchanged (such recordings are kept); the all-targets-missing case is still a ValueError regardless of the flag (see eegdash#21). Stacked on the eegdash#21 branch (target_name validation). Closes eegdash#22. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
I maybe have the impression that this issue was fix in the pass, can you create an minimal code that reproduce the current code source? |
|
@bruAristimunha sure on my dev branch seems not let me double check with code on that. |
|
@bruAristimunha I double-checked on current import tempfile
from eegdash import EEGDashDataset
tmp = tempfile.mkdtemp()
def rec(sub, age):
r = {
"dataset": "dsTest",
"bids_relpath": f"sub-{sub}/eeg/sub-{sub}_task-rest_eeg.set",
"bidspath": f"dsTest/sub-{sub}/eeg/sub-{sub}_task-rest_eeg.set",
"storage": {"backend": "local", "base": tmp},
"entities_mne": {"subject": sub, "task": "rest"},
"ntimes": 100, "sampling_frequency": 100,
}
if age is not None:
r["age"] = age
return r
records = [rec("01", 25), rec("02", None), rec("03", float("nan"))]
ds = EEGDashDataset(cache_dir=tmp, dataset="dsTest", records=records,
download=False, target_name="age")
print(len(ds.datasets)) # -> 3
print([d.description.get("age") for d in ds.datasets]) # -> [25, nan, nan]Output on So the NaN/None-target recordings are silently kept, and there is no switch to drop them — This PR adds that opt-in switch (default |
|
thanks for the PRs @duanyiqun, if you want talk and structure the efforts more, please ping at linkedin :) |
Summary
remove_nan_targetsparameter toEEGDashDataset. Whentarget_nameis set andremove_nan_targets=True, recordings whose target value is missing (None/NaN) are dropped at construction, with a warning reporting the count.Design note: default is
FalseFalseinstead, because dropping recordings by default would silently changelen(dataset)for existing users and is a backward-incompatible behaviour change. Opt in withremove_nan_targets=True.ValueError(handled by thetarget_namevalidation in Check var when creating dataset #21), which matches the "error if all the datasets are NaN" request.Dependency
target_namevalidation, PR [Claude + Human] Validate target_name and raise on a misspelled name #381). This branch contains the Check var when creating dataset #21 commit as well; please merge [Claude + Human] Validate target_name and raise on a misspelled name #381 first, after which this branch rebased ontodevelopshows only its own diff.Impact
Validation
ruff check+ruff format --checkclean on changed filespytest tests/unit_tests/dataset/test_remove_nan_targets.py tests/unit_tests/dataset/test_target_name_validation.py→ 7 passedCloses #22.
🤖 Generated with Claude Code