Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaces region-based imputation with a strategy-driven interface ( Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (harpy impute)
participant Parser as Strategy Parser (ImputeStrategy)
participant VCF as VCF Validator
participant RegionGen as Window/Region Generator
participant Snake as Snakemake Workflow
participant Stitch as STITCH
participant Merger as VCF Merger
CLI->>Parser: parse --strategy (window:|contig:start-end|all) and --buffer
Parser->>VCF: request contigs (get_contigs) if needed
VCF-->>Parser: contigs dict (name → length)
Parser->>RegionGen: if window -> makewindows(contigs, window, buffer)
RegionGen-->>Snake: list of per-contig regions / region strings
Snake->>Stitch: launch per-(contig,region) jobs (--regionStart/--regionEnd/--buffer)
Stitch-->>Snake: region-scoped VCF.gz + indices + plots
Snake->>Merger: merge_regions() → contig-level BCFs
Merger->>Merger: merge_contigs() → final paramset BCF/CSI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (3)
harpy/common/environments.py (1)
59-60: LGTM! The tabix dependency addition is necessary.The addition of
bioconda::tabixis correct and required, as the impute workflow explicitly invokes tabix to index VCF files (as shown in harpy/snakefiles/impute.smk:166).Optional: Consider pinning the tabix version for reproducibility
For better reproducibility and to prevent potential issues from future tabix updates, consider specifying a version constraint:
"impute" : [ "bioconda::r-stitch>=1.8.4", - "bioconda::tabix" + "bioconda::tabix=1.11" ],Note: While this file shows mixed practices (some dependencies are pinned, others aren't), version pinning helps ensure consistent behavior across environments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/common/environments.py` around lines 59 - 60, The dependency list now includes the unpinned package identifier "bioconda::tabix"; update that entry to a version-pinned spec (e.g., change the "bioconda::tabix" list item to a concrete version constraint matching the version used in CI or your environment) so the environment is reproducible—locate and replace the "bioconda::tabix" string in the dependency list in environments.py (the same entry referenced by the impute workflow) with a pinned version constraint.harpy/report/components.py (2)
232-235: Renameinputhere.This shadows the Python builtin and is what Ruff is complaining about. A small rename keeps notebook debugging less confusing.
Small cleanup
def extract_metric(x: list[str], param: str): '''Convenience function to find the relevnant sections of the bcftools.stats file and return a table''' - input = "".join(s for s in x if s.startswith(param)) - return pd.read_table(StringIO(input), sep = "\t", header = None) + matched_text = "".join(s for s in x if s.startswith(param)) + return pd.read_table(StringIO(matched_text), sep = "\t", header = None)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/report/components.py` around lines 232 - 235, The function extract_metric contains a local variable named input that shadows the Python builtin; rename that variable (e.g., matched_text or input_str) in the extract_metric function so the call StringIO(input) and the comprehension "".join(...) use the new name, preserving behavior and return value while avoiding builtin shadowing; update both the assignment and the StringIO(...) reference inside extract_metric.
165-230:stitch_image_viewer()now duplicatesimage_viewer()almost verbatim.The notebook callers already pass
sortkey,recursive=True, andthing_to_select="region"directly intoimage_viewer(), so keeping a second encoder/template means future fixes have to land twice. Consider making this a thin wrapper instead.Refactor sketch
-def stitch_image_viewer(label: str, dir: str, pattern: str, scale: float = 1.0): - ''' - Create a javascript image viewer with a file picker for the different chromosomal regions. - Images are embedded (thus can safely have their original files deleted) and can be down-scaled. - ''' - imgfmt = "image/png" if "png" in pattern else "image/jpg" - options_parts = [] - paths = sorted( - Path(dir).rglob(pattern), - key=lambda p: int(p.parents[1].name.split('-')[0]) - ) - uid = uuid.uuid4().hex[:8] - images_dict = {} - for p in paths: - pname = p.parents[1].name - options_parts.append(f'<option value="{pname}">{pname}</option>') - image = PImage.open(p) - if scale < 1: - image = image.resize((int(image.size[0] * scale), int(image.size[1] * scale)), PImage.LANCZOS) - buf = BytesIO() - image.save(buf, format="PNG" if "png" in pattern else "jpeg") - buf.seek(0) - encoded = base64.b64encode(buf.read()).decode() - images_dict[pname] = f"data:{imgfmt};base64,{encoded}" - - images_js = json.dumps(images_dict) - options_html = "\n".join(options_parts) - - html = f"""...""" - display(HTML(html)) +def stitch_image_viewer(label: str, image_dir: str, pattern: str, scale: float = 1.0): + return image_viewer( + label, + image_dir, + pattern, + sortkey=lambda p: int(p.parents[1].name.split('-')[0]), + thing_to_select="region", + recursive=True, + scale=scale, + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/report/components.py` around lines 165 - 230, stitch_image_viewer duplicates image_viewer; replace its full implementation with a thin wrapper that delegates to image_viewer to avoid duplicated encoder/template logic. Specifically, change stitch_image_viewer(label: str, dir: str, pattern: str, scale: float = 1.0) so it simply calls image_viewer(...) passing through label, dir, pattern and scale and forwarding any additional needed args (e.g., sortkey, recursive=True, thing_to_select="region" where callers expect them) so all encoding/HTML generation remains centralized in image_viewer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@harpy/commands/impute.py`:
- Around line 65-84: The code treats region as a bool, causing
workflow.param(region, "region") to write True instead of the validated region
string; change the logic to keep a boolean flag (e.g., is_region) and capture
the validated region string from vcffile.validate_region(strategy) (e.g.,
validated_region = vcffile.validate_region(strategy)), then pass
validated_region to workflow.param(validated_region, "region") when is_region is
True; also ensure window is still parsed from strategy when it
startswith("window:") and that workflow.param(window, "window-size") remains
used when appropriate.
In `@harpy/common/cli_filetypes.py`:
- Around line 281-289: The validator currently allows zero/negative window sizes
and invalid region bounds; update the window handling and region validation in
the block that checks pieces[0] == "window" and the else branch that parses
parts = pieces[1].split("-"): for the "window" branch ensure pieces[1] parses to
an integer > 0 (use int(...) or isInt and then int) and call self.fail with the
existing message if <= 0; for the region branch ensure both bounds parse as
integers, start > 0, end > start (use self.isInt on both and compare ints) and
call self.fail with the region error message when those checks fail so
zero/negative and reversed coordinates are rejected before downstream use.
In `@harpy/report/components.py`:
- Around line 95-99: embed_image currently opens the file with PImage.open(x),
resizes and saves back to the same path (overwriting the source); change it to
resize the image in memory and embed that image data so the source file isn't
modified. In the embed_image function, after PImage.open(x) perform the resize
into an in-memory buffer (e.g., BytesIO) and return an IPython.display.Image
constructed from that in-memory bytes with embed=True instead of using
Image(filename=x), leaving the original file untouched and preventing cumulative
degradation.
- Around line 108-123: The dropdown keys are collapsing because pname is set to
p.parents[1].name which is identical for flat directories, causing
images_dict[pname] to be overwritten; change the selector key to a unique file
identifier (e.g., use p.name or a relative path like str(p.relative_to(dir)))
and use that same unique key when appending to options_parts and populating
images_dict so each image entry is unique (update references to pname in the
block that builds paths, options_parts, images_dict and images_js).
In `@harpy/snakefiles/impute.smk`:
- Around line 178-200: The rule merge_regions currently drops the {contig}
wildcard in its region-mode outputs while still referencing wc.contig in the
input lambda and log, breaking wildcard inference and downstream contig_report;
change merge_regions to always emit the contig-level outputs
"{paramset}/contigs/{contig}/{contig}.bcf" and
"{paramset}/contigs/{contig}/{contig}.bcf.csi" (so wc.contig stays inferred and
the input lambda and log can keep using wc.contig), and then add a tiny
finalization rule (e.g., finalize_paramset_bcf) that runs only for the
single-region case to create the archive-level "{paramset}/{paramset}.bcf" and
"{paramset}/{paramset}.bcf.csi" (by copying/concatenating or symlinking the
contig BCFs and creating indexes) so contig_report and other consumers continue
to get the contig-level path and the paramset-level artifacts still exist when
needed.
- Around line 54-57: The region parsing incorrectly splits twice on different
separators causing start to include the contig (e.g., "3L:3000"); update the
logic that builds contigs[...]/Contig by first splitting region on ":" into cntg
and coords (e.g., cntg, coords = region.split(":", 1)), then split coords on "-"
into start and end (e.g., start, end = coords.split("-", 1)), and convert start
and end to int before constructing Contig(cntg, int(end), start=int(start)) so
int() receives only numeric strings; update the code that sets cntg and
contigs[...] accordingly.
In `@harpy/validation/vcf.py`:
- Around line 154-172: The region parsing currently accepts nonsensical ranges
(e.g., start 0 or start > end); update the logic around contig, startpos, endpos
in the method that splits region so you validate ordering and bounds: after
computing startpos/endpos but before using/returning them, ensure startpos is >=
1 and startpos <= endpos (and keep the existing endpos > contig adjustment), and
on invalid ranges call self.print.validation(False) and self.print.error with a
clear message and guidance (similar style to the existing contig-missing error
referencing self.biallelic_file), otherwise call self.print.validation(True) and
return the validated/adjusted contig, startpos, endpos.
---
Nitpick comments:
In `@harpy/common/environments.py`:
- Around line 59-60: The dependency list now includes the unpinned package
identifier "bioconda::tabix"; update that entry to a version-pinned spec (e.g.,
change the "bioconda::tabix" list item to a concrete version constraint matching
the version used in CI or your environment) so the environment is
reproducible—locate and replace the "bioconda::tabix" string in the dependency
list in environments.py (the same entry referenced by the impute workflow) with
a pinned version constraint.
In `@harpy/report/components.py`:
- Around line 232-235: The function extract_metric contains a local variable
named input that shadows the Python builtin; rename that variable (e.g.,
matched_text or input_str) in the extract_metric function so the call
StringIO(input) and the comprehension "".join(...) use the new name, preserving
behavior and return value while avoiding builtin shadowing; update both the
assignment and the StringIO(...) reference inside extract_metric.
- Around line 165-230: stitch_image_viewer duplicates image_viewer; replace its
full implementation with a thin wrapper that delegates to image_viewer to avoid
duplicated encoder/template logic. Specifically, change
stitch_image_viewer(label: str, dir: str, pattern: str, scale: float = 1.0) so
it simply calls image_viewer(...) passing through label, dir, pattern and scale
and forwarding any additional needed args (e.g., sortkey, recursive=True,
thing_to_select="region" where callers expect them) so all encoding/HTML
generation remains centralized in image_viewer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fef0a5b0-0bd4-403d-8063-0824facdbe66
📒 Files selected for processing (13)
.github/workflows/tests.ymlharpy/commands/impute.pyharpy/common/cli_filetypes.pyharpy/common/environments.pyharpy/common/file_ops.pyharpy/common/printing.pyharpy/report/components.pyharpy/report/notebooks/preproc_stats.ipynbharpy/report/notebooks/stitch_collate.ipynbharpy/snakefiles/impute.smkharpy/validation/impute_parameters.pyharpy/validation/vcf.pypixi.toml
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
harpy/report/components.py (1)
115-127:⚠️ Potential issue | 🟠 Major
images_dictkeys can still be overwritten when labels collide.At Line 117/126,
pnameis both display label and dictionary key. If two files resolve to the same label, one silently replaces the other.Proposed fix
- for p in paths: - pname = option_key(p) - options_parts.append(f'<option value="{pname}">{pname}</option>') + for p in paths: + display_name = option_key(p) + key = str(p.relative_to(image_dir)) + options_parts.append(f'<option value="{key}">{display_name}</option>') @@ - images_dict[pname] = f"data:{imgfmt};base64,{encoded}" + images_dict[key] = f"data:{imgfmt};base64,{encoded}"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/report/components.py` around lines 115 - 127, The code currently uses pname (from option_key) as both the option label and the images_dict key, allowing collisions to silently overwrite entries; modify the loop that builds options_parts and images_dict (referencing option_key, paths, pname, options_parts, images_dict) to ensure unique dictionary keys by disambiguating duplicate pname values (for example by appending a numeric suffix or unique index when a pname already exists in images_dict) and use that unique key for both the <option value="..."> entry and the images_dict key so the displayed label remains unchanged while stored keys are distinct.harpy/commands/impute.py (1)
65-70:⚠️ Potential issue | 🟠 MajorPreserve the validated region string instead of a boolean.
This still drops the clamped
(contig, start, end)returned byvalidate_region(), so Snakemake sees the originalstrategyvalue and the workflow summary reportsTarget Region: Trueinstead of the actual region.🔧 Minimal fix
- region = strategy.lower() != "all" and not strategy.lower().startswith("window:") - window = None - if region: - vcffile.validate_region(strategy) + region = None + window = None + if strategy.lower() != "all" and not strategy.lower().startswith("window:"): + contig, startpos, endpos = vcffile.validate_region(strategy) + region = f"{contig}:{startpos}-{endpos}" elif strategy.lower().startswith("window:"): - window = int(strategy.split(":")[1]) + window = int(strategy.split(":", 1)[1]) @@ workflow.param(grid_size, "grid-size") if region: - workflow.param(strategy, "region") + workflow.param(region, "region") elif window: workflow.param(window, "window-size")Also applies to: 79-80, 89-95
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/commands/impute.py` around lines 65 - 70, The code currently sets region as a boolean and calls vcffile.validate_region(strategy) without preserving its return value; change this so region is assigned the validated region string/tuple returned by vcffile.validate_region(strategy) (e.g., region = vcffile.validate_region(strategy)) and only treat it as None when not provided, and keep the window handling (window = int(strategy.split(":")[1])) unchanged; apply the same fix to the other occurrences where validate_region is called (the similar blocks referenced around the validate_region calls at the later branches) so Snakemake receives the actual validated region instead of True/False.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@harpy/report/components.py`:
- Around line 118-155: The HTML injection risk comes from unescaped filenames
and selector text being interpolated into the template (see options_parts.append
usage building option elements from pname and the interpolations of label and
thing_to_select into the HTML). Fix by HTML-escaping all user/file-derived
strings before inserting into attributes/text: escape pname when building
options (both value and inner text), and escape label and thing_to_select when
interpolating into the template (use html.escape(..., quote=True) or
equivalent). Ensure the escaped strings are used for options_html and in the
f-string template so no raw filenames or labels are rendered unescaped.
- Around line 101-102: The image saving and the returned Image metadata disagree
on format: image.save(..., format = "PNG" if "png" in x.lower() else "jpeg") may
save as JPEG but the return always uses format="png"; update the logic in the
block around image.save and the return Image(...) to compute a single format
variable (e.g., format_str = "PNG" if "png" in x.lower() else "JPEG") and use
that same variable for both image.save(buf, format=format_str) and return
Image(data=buf.getvalue(), format=format_str.lower() or format_str as required
by Image), ensuring consistent case/values expected by the Image constructor.
In `@harpy/report/utilities.py`:
- Around line 7-10: The extract_metric helper should avoid calling pd.read_table
on empty input and should match section tokens more strictly: in function
extract_metric, change the selection filter to only include lines that start
with the exact section token plus a tab (e.g. s.startswith(f"{param}\t")) and
after building selectiontext check if it's empty/blank (selectiontext.strip() ==
"") and if so return an empty pandas.DataFrame instead of calling pd.read_table;
otherwise proceed to parse with pd.read_table(StringIO(selectiontext), sep="\t",
header=None).
In `@harpy/snakefiles/impute.smk`:
- Around line 181-199: The rule currently uses find on the filesystem to build
{output.filelist} which can pick up stale files and ignores the deterministic
input order; instead, use the declared Snakemake inputs (the lambda/collect
expression producing the per-region VCFs via contigs[wc.contig].regions and ext)
by adding a named input list (e.g., vcf_files) and write {output.filelist} from
that list inside the shell command (or via a run block) so you concatenate
exactly the scheduled files in the intended order; apply the same change to the
duplicate block mentioned for lines 203-219, and reference the same wildcards
({wildcards.paramset}, {wildcards.contig}) and outputs (output.filelist,
output.bcf) when writing the filelist.
In `@harpy/validation/vcf.py`:
- Around line 167-172: Update the user-facing error text in the region
validation to reflect the current biallelic threshold used by
find_biallelic_contigs(): change the phrase "at least 2 biallelic SNPs" to "at
least 5 biallelic SNPs" (or, if a threshold constant/property exists,
interpolate that value instead) in the error constructed in the validation block
that prints "missing contig" so the message in the contig check matches
find_biallelic_contigs()'s criterion.
---
Duplicate comments:
In `@harpy/commands/impute.py`:
- Around line 65-70: The code currently sets region as a boolean and calls
vcffile.validate_region(strategy) without preserving its return value; change
this so region is assigned the validated region string/tuple returned by
vcffile.validate_region(strategy) (e.g., region =
vcffile.validate_region(strategy)) and only treat it as None when not provided,
and keep the window handling (window = int(strategy.split(":")[1])) unchanged;
apply the same fix to the other occurrences where validate_region is called (the
similar blocks referenced around the validate_region calls at the later
branches) so Snakemake receives the actual validated region instead of
True/False.
In `@harpy/report/components.py`:
- Around line 115-127: The code currently uses pname (from option_key) as both
the option label and the images_dict key, allowing collisions to silently
overwrite entries; modify the loop that builds options_parts and images_dict
(referencing option_key, paths, pname, options_parts, images_dict) to ensure
unique dictionary keys by disambiguating duplicate pname values (for example by
appending a numeric suffix or unique index when a pname already exists in
images_dict) and use that unique key for both the <option value="..."> entry and
the images_dict key so the displayed label remains unchanged while stored keys
are distinct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9cbd1602-6b39-456b-bf3a-733ff5f8739a
📒 Files selected for processing (8)
harpy/commands/impute.pyharpy/common/cli_filetypes.pyharpy/report/components.pyharpy/report/notebooks/bcftools_stats.ipynbharpy/report/notebooks/stitch_collate.ipynbharpy/report/utilities.pyharpy/snakefiles/impute.smkharpy/validation/vcf.py
🚧 Files skipped from review as they are similar to previous changes (1)
- harpy/report/notebooks/stitch_collate.ipynb
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
harpy/commands/impute.py (1)
84-85: Inconsistent case handling for "all" strategy.Lines 65 and 70 use
strategy.lower()for comparisons, but line 84 usesstrategy != "all"without.lower(). While this currently works (sinceImputeStrategyenforces lowercase "all"), the inconsistency could become a bug if the validator ever allows case-insensitive input.Suggested fix for consistency
- if strategy != "all": + if strategy.lower() != "all": workflow.param(buffer, "buffer")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/commands/impute.py` around lines 84 - 85, The comparison against the "all" strategy is inconsistent: change the check that currently reads strategy != "all" to use a normalized form (e.g., strategy.lower() != "all") so it matches the other comparisons that call strategy.lower(); update the conditional around workflow.param(buffer, "buffer") to use strategy.lower() for consistency with earlier checks (referencing the strategy variable and the workflow.param(buffer, "buffer") call in impute.py and the ImputeStrategy usage).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@harpy/common/cli_filetypes.py`:
- Around line 286-291: The two self.fail(...) calls that only pass a message
should be updated to include the Click parameter and context objects (i.e., call
self.fail(message, param, ctx)) to match the other failures; locate the failing
calls that validate window strategy and region format (the self.fail invocations
near the pieces/parts checks in the method that uses self.isInt) and change them
to pass the same param and ctx variables used by the surrounding self.fail
usages.
---
Nitpick comments:
In `@harpy/commands/impute.py`:
- Around line 84-85: The comparison against the "all" strategy is inconsistent:
change the check that currently reads strategy != "all" to use a normalized form
(e.g., strategy.lower() != "all") so it matches the other comparisons that call
strategy.lower(); update the conditional around workflow.param(buffer, "buffer")
to use strategy.lower() for consistency with earlier checks (referencing the
strategy variable and the workflow.param(buffer, "buffer") call in impute.py and
the ImputeStrategy usage).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 887e1436-1833-4ebc-91da-9705e2780065
📒 Files selected for processing (2)
harpy/commands/impute.pyharpy/common/cli_filetypes.py
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
harpy/report/notebooks/validate_fastq.ipynb (1)
104-144:⚠️ Potential issue | 🟠 MajorIncomplete migration will cause runtime errors.
Two issues in the table rendering:
Line 110: The
ITableinstance is created but.render()is never called, so it won't display anything.Lines 123-143: The cell still calls
standard_itable(...)and referencesitables.JavascriptFunction, but neitherstandard_itablenoritablesare imported. This will raiseNameErrorat runtime.You should either:
- Remove the old
standard_itablecell entirely and add.render()to the newITablecall, or- Replace the old cell's logic with the new
ITableapproachProposed fix (option 1: use ITable with render)
Remove lines 113-145 (the old
standard_itablecell) and update the new cell:- "tbl = ITable(data, \"validate.fastq.csv\")" + "ITable(data, \"validate.fastq.csv\", fixedcols=1).render()"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/report/notebooks/validate_fastq.ipynb` around lines 104 - 144, The ITable instance is created but never rendered and the old standard_itable cell references missing names (standard_itable, itables.JavascriptFunction) causing runtime NameError; fix by removing the legacy standard_itable cell and call .render() on the ITable object (i.e., replace the current "tbl = ITable(data, \"validate.fastq.csv\")" cell with a cell that invokes tbl.render()), or alternatively replace the legacy cell body with the new ITable.render() approach; if you choose to keep standard_itable, import the missing symbols (standard_itable and itables) before using them.harpy/report/notebooks/validate_bam.ipynb (1)
121-161:⚠️ Potential issue | 🟠 MajorIncomplete migration will cause runtime errors.
Same issue as
validate_fastq.ipynb:
Line 127:
ITableinstance created but.render()not called - nothing will display.Lines 140-160: The cell calls
standard_itable(...)anditables.JavascriptFunction, but neither is imported. This will raiseNameError.Proposed fix
Remove lines 130-162 (the old
standard_itablecell) and update the new cell:- "tbl = ITable(data, \"validate.bam.csv\", fixedcols = 1)" + "ITable(data, \"validate.bam.csv\", fixedcols=1).render()"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/report/notebooks/validate_bam.ipynb` around lines 121 - 161, The ITable created via ITable(data, "validate.bam.csv", fixedcols = 1) is never rendered and the later cell references standard_itable and itables.JavascriptFunction which are not imported; remove the old standard_itable cell, call tbl.render() immediately after creating the ITable (i.e., invoke ITable(...).render() or add tbl.render()), and update the remaining cell to either import standard_itable and itables (so standard_itable(..., coldefs=[{"createdCell": itables.JavascriptFunction(...)}]) resolves) or replace itables.JavascriptFunction with the correct, already-imported helper; ensure all referenced symbols (ITable, tbl.render, standard_itable, itables.JavascriptFunction) are present via import or replaced so no NameError occurs.
♻️ Duplicate comments (1)
harpy/snakefiles/impute.smk (1)
217-224:⚠️ Potential issue | 🔴 CriticalSame issue: file list needs one file per line.
Apply the same fix as
merge_regionsto ensurebcftools concat -freceives properly formatted input.🐛 Proposed fix
shell: """ - echo {input} > {output.filelist} + printf '%s\n' {input} > {output.filelist} {{ bcftools concat --threads {threads} -f {output.filelist} | bcftools sort -Ob -o {output.bcf} --write-index }} 2> {log} """🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/snakefiles/impute.smk` around lines 217 - 224, The shell block in the rule using bcftools concat (the echo {input} > {output.filelist} and bcftools concat -f {output.filelist} invocation) writes all input paths on one line; change it to write one filename per line (as done in merge_regions) so bcftools concat -f gets a properly formatted filelist—use a printf/join-with-newline approach to emit each {input} item on its own line before calling bcftools concat, keeping the existing {output.bcf}, {output.filelist}, {log} and {threads} symbols.
🧹 Nitpick comments (3)
harpy/report/notebooks/align_stats.ipynb (2)
718-718: Minor style inconsistency: positional vs keyword argument.This call passes
fixedcolsas a positional argument (1), while otherITablecalls in this PR use the keyword form (fixedcols=1). Consider usingfixedcols=1for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/report/notebooks/align_stats.ipynb` at line 718, The ITable(...) call uses a positional argument for fixedcols causing style inconsistency; change the call ITable(contig_avg, f"{samplename}.align.depth.csv", 1).render() to pass fixedcols as a keyword (fixedcols=1) so it matches other ITable usages and keeps argument order explicit.
732-732: Same style note as above.Uses positional
1forfixedcolsinstead of keywordfixedcols=1.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/report/notebooks/align_stats.ipynb` at line 732, The call to ITable uses a positional argument for fixedcols (passing 1) which is inconsistent with the rest of the notebook; change the call ITable(outliers.sort_values('Read Depth', ascending=False).head(100), f"{samplename}.align.outliers.csv", 1).render() to use the keyword argument fixedcols=1 so it becomes ITable(..., fixedcols=1).render(), ensuring consistency and clarity.harpy/report/notebooks/stitch_collate.ipynb (1)
288-288: Consider adding defensive handling for the sortkey lambda.The sortkey lambda
lambda p: int(p.parents[1].name.split('-')[0])assumes the grandparent directory name contains a-delimiter and the first part is numeric. If the directory structure doesn't match (e.g., unexpected naming or missing-), this will raiseValueErrororIndexError.If the directory structure is guaranteed by the workflow, this is fine. Otherwise, consider wrapping with a try-except or validating the format.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@harpy/report/notebooks/stitch_collate.ipynb` at line 288, The sort key lambda used in the image_viewer call (image_viewer(..., sortkey = lambda p: int(p.parents[1].name.split('-')[0]), ...)) can raise ValueError/IndexError for unexpected directory names; replace the inline lambda with a small defensive helper (e.g. safe_sortkey) that attempts to parse int(p.parents[1].name.split('-')[0]) inside a try/except and returns a sensible fallback (like 0 or the original name) on failure, then pass sortkey=safe_sortkey to image_viewer to avoid crashes on malformed directory names.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@harpy/common/summaries.py`:
- Around line 146-157: The code reads PARAMETERS.get("window", None) but the
imputation workflow passes the value under "window-size", so the window variable
is always None and the elif window: branch never runs; update the lookup in
summaries.py to use PARAMETERS.get("window-size", None) (affecting the window
variable used in the region/regiontext logic that builds regiontext for region,
buffer and window branches) so the buffer is correctly appended when windowed
imputation is used.
In `@harpy/snakefiles/impute.smk`:
- Around line 178-192: In the rule merge_regions, the two input entries (the
first lambda collecting .vcf.gz.tbi paths and the second vcf = lambda collecting
.vcf.gz paths) are missing a separating comma; edit the rule merge_regions input
block to insert a comma after the first collect(...) lambda so the second named
input vcf = lambda wc: collect(...) is a separate list item, ensuring correct
Python/Snakemake syntax for the input declarations that reference
contigs[wc.contig].regions.
- Around line 197-204: The current shell block writes {input.vcf} into
{output.filelist} with echo which produces a single space-separated line but
bcftools concat -f expects one file path per line; update the shell command in
the rule (the shell: block that creates {output.filelist} and then runs
`bcftools concat -f {output.filelist}`) to emit one filename per line (e.g., use
printf '%s\n' or replace spaces with newlines) so each input path is on its own
line before calling bcftools concat.
- Around line 18-19: In the impute() method, change the PARAMETERS lookup key
from "window" to "window-size" (replace self.PARAMETERS.get("window", None) with
self.PARAMETERS.get("window-size", None)) so the parameter name matches what
impute() is passed by the Snakefile/command; update any local variable usage
(e.g., window) as needed to use the retrieved value.
In `@pixi.toml`:
- Line 20: The pixi.toml has a corrupted dependency entry for the pandas key
(duplicate/concatenated text). Edit the pixi.toml entry for "pandas" to remove
the duplicated fragment so the line reads a single valid TOML assignment (e.g.,
pandas = ">=2.3.3,<3"), ensuring proper quoting and no duplicated key; after
fixing, run a TOML validator or the project's dependency parser to confirm the
manifest parses cleanly.
---
Outside diff comments:
In `@harpy/report/notebooks/validate_bam.ipynb`:
- Around line 121-161: The ITable created via ITable(data, "validate.bam.csv",
fixedcols = 1) is never rendered and the later cell references standard_itable
and itables.JavascriptFunction which are not imported; remove the old
standard_itable cell, call tbl.render() immediately after creating the ITable
(i.e., invoke ITable(...).render() or add tbl.render()), and update the
remaining cell to either import standard_itable and itables (so
standard_itable(..., coldefs=[{"createdCell": itables.JavascriptFunction(...)}])
resolves) or replace itables.JavascriptFunction with the correct,
already-imported helper; ensure all referenced symbols (ITable, tbl.render,
standard_itable, itables.JavascriptFunction) are present via import or replaced
so no NameError occurs.
In `@harpy/report/notebooks/validate_fastq.ipynb`:
- Around line 104-144: The ITable instance is created but never rendered and the
old standard_itable cell references missing names (standard_itable,
itables.JavascriptFunction) causing runtime NameError; fix by removing the
legacy standard_itable cell and call .render() on the ITable object (i.e.,
replace the current "tbl = ITable(data, \"validate.fastq.csv\")" cell with a
cell that invokes tbl.render()), or alternatively replace the legacy cell body
with the new ITable.render() approach; if you choose to keep standard_itable,
import the missing symbols (standard_itable and itables) before using them.
---
Duplicate comments:
In `@harpy/snakefiles/impute.smk`:
- Around line 217-224: The shell block in the rule using bcftools concat (the
echo {input} > {output.filelist} and bcftools concat -f {output.filelist}
invocation) writes all input paths on one line; change it to write one filename
per line (as done in merge_regions) so bcftools concat -f gets a properly
formatted filelist—use a printf/join-with-newline approach to emit each {input}
item on its own line before calling bcftools concat, keeping the existing
{output.bcf}, {output.filelist}, {log} and {threads} symbols.
---
Nitpick comments:
In `@harpy/report/notebooks/align_stats.ipynb`:
- Line 718: The ITable(...) call uses a positional argument for fixedcols
causing style inconsistency; change the call ITable(contig_avg,
f"{samplename}.align.depth.csv", 1).render() to pass fixedcols as a keyword
(fixedcols=1) so it matches other ITable usages and keeps argument order
explicit.
- Line 732: The call to ITable uses a positional argument for fixedcols (passing
1) which is inconsistent with the rest of the notebook; change the call
ITable(outliers.sort_values('Read Depth', ascending=False).head(100),
f"{samplename}.align.outliers.csv", 1).render() to use the keyword argument
fixedcols=1 so it becomes ITable(..., fixedcols=1).render(), ensuring
consistency and clarity.
In `@harpy/report/notebooks/stitch_collate.ipynb`:
- Line 288: The sort key lambda used in the image_viewer call (image_viewer(...,
sortkey = lambda p: int(p.parents[1].name.split('-')[0]), ...)) can raise
ValueError/IndexError for unexpected directory names; replace the inline lambda
with a small defensive helper (e.g. safe_sortkey) that attempts to parse
int(p.parents[1].name.split('-')[0]) inside a try/except and returns a sensible
fallback (like 0 or the original name) on failure, then pass
sortkey=safe_sortkey to image_viewer to avoid crashes on malformed directory
names.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 1bb0b66a-486e-417f-8c25-49b18cb3e294
📒 Files selected for processing (18)
harpy/common/cli_filetypes.pyharpy/common/summaries.pyharpy/report/components.pyharpy/report/notebooks/align_bxstats.ipynbharpy/report/notebooks/align_stats.ipynbharpy/report/notebooks/bcftools_stats.ipynbharpy/report/notebooks/hapcut.ipynbharpy/report/notebooks/impute.ipynbharpy/report/notebooks/preproc_stats.ipynbharpy/report/notebooks/qc_bx_stats.ipynbharpy/report/notebooks/stitch_collate.ipynbharpy/report/notebooks/sv.ipynbharpy/report/notebooks/validate_bam.ipynbharpy/report/notebooks/validate_fastq.ipynbharpy/report/utilities.pyharpy/snakefiles/impute.smkharpy/validation/vcf.pypixi.toml
🚧 Files skipped from review as they are similar to previous changes (3)
- harpy/report/notebooks/preproc_stats.ipynb
- harpy/common/cli_filetypes.py
- harpy/report/components.py
This significantly modifies the impute workflow to use a buffered window strategy as the default. The region strategy is still available, as is the legacy do-all-of-it strategy. This is now all contained with the
--strategy-soption. Checks and validations have been updated to match, as have the impute stitch_collate reports. Fixes #275Summary by CodeRabbit
New Features
Bug Fixes
Chores