This repo is under development!
HAPHiR performs high‑quality bacterial genome assembly using PacBio HiFi long reads and Illumina short reads, combining accuracy, robustness, and efficient cloud execution.
The workflow runs multiple long‑read assemblers in parallel (Flye, Hifiasm, Raven, wtdbg2) and generates a unified, high‑confidence consensus assembly using Autocycler. Small circular plasmids are recovered through a dedicated hybrid assembly step using Plassembler, ensuring both chromosomal and plasmid components are accurately reconstructed.
HAPHiR is designed for cloud‑native execution on Terra, but can also be run locally using WDL executer such as miniwdl or Cromwell.
- Multi-assembler consensus: Runs 4 independent long-read assemblers (Flye, Hifiasm, Wtdbg2, Raven) and combines them using Autocycler for enhanced accuracy
- HiFI only support: Works with PacBio HiFi-only data or hybrid HiFi + Illumina data
- Plasmid recovery: Dedicated plasmid assembly and recovery using Plassembler
- Flexible inputs: Accepts PacBio BAM or FASTQ files, automatically converts as needed
- Quality control: Includes read trimming, genome size estimation, and coverage normalization
- Polishing: Short-read polishing with Polypolish
- Annotation: Optional standardized annotation with Bakta
- Antimicrobial resistance detection: Optional AMR analysis with AmrFinderPlus
- Cloud-ready: Designed for scalable execution on Terra
- Containerized: All tools run in Docker containers for reproducibility
- Docker or another supported container runtime
miniwdlfor local workflow execution- Java 8+ for Cromwell/WOMtool validation
- Optional: Terra account for cloud execution
Use the single-sample workflow workflows/wf_haphir.wdl:
miniwdl run workflows/wf_haphir.wdl \
id=sample1 \
long_fq=sample1.hifi.fastq.gz \
short_fq1=sample1.R1.fastq.gz \
short_fq2=sample1.R2.fastq.gzIf long_fq is a BAM file, HAPHiR will automatically convert it to FASTQ before assembly.
Use workflows/wf_haphir_batch.wdl with a TSV sample sheet.
Example samples.tsv formats:
# HiFi-only samples
sample1 /path/to/sample1.hifi.fastq.gz
sample2 /path/to/sample2.hifi.fastq.gz
# Hybrid samples
sample3 /path/to/sample3.hifi.fastq.gz /path/to/sample3.R1.fastq.gz /path/to/sample3.R2.fastq.gz
Run the batch workflow:
miniwdl run workflows/wf_haphir_batch.wdl samplesheet=samples.tsv| Input | Type | Description |
|---|---|---|
id |
String | Sample identifier |
long_fq |
File | PacBio HiFi reads (FASTQ or BAM) |
short_fq1 |
File? | Illumina forward reads (optional) |
short_fq2 |
File? | Illumina reverse reads (optional) |
organism |
String? | Taxonomic name used for annotation (optional) |
bakta_annotation |
Boolean | Run Bakta annotation (default: false) |
amrfinder |
Boolean | Run AmrFinderPlus AMR detection (default: true) |
samplesheet— a TSV file parsed byread_tsv(samplesheet)- Each row may contain either 2 columns (
id,long_fq) or 4 columns (id,long_fq,short_fq1,short_fq2)
- Convert input BAM to FASTQ if needed using
pbtk - Estimate genome size with
lrge - Downsample reads with
rasusafor consistent long-read coverage - Run Flye, Hifiasm, Wtdbg2, and Raven in parallel
- Generate a consensus assembly using
Autocycler - Recover plasmids with
Plassemblerwhen paired-end Illumina reads are provided - Map plasmids to long read consensus with
Minimap2, filter and merge plasmids. - Polish with
Polypolishwhen short reads are available - Reorient the final assembly with
dnaapler - Create assembly visualizations with
Bandage - Optionally run
Baktaannotation andAmrFinderPlusAMR detection
Primary outputs exposed by the workflow:
| Output | Description |
|---|---|
final_assembly |
Final reoriented consensus FASTA |
dnaapler_summary |
Dnaapler orientation report |
autocycler_assembly |
Consensus assembly FASTA from Autocycler |
autocycler_graph |
Autocycler assembly graph |
asm_viz |
Assembly comparison and visualization |
fastp_report |
Fastp trimming report (when paired reads are provided) |
plassembler_plasmids |
Recovered plasmid FASTA |
plassembler_graph |
Plassembler assembly graph |
plassembler_summary |
Plassembler summary report |
minimap2_report |
Minimap2 overlap report |
merge_summary |
Assembly merge decisions summary |
bakta_outputs |
Bakta annotation outputs |
amrfinder_report |
AmrFinderPlus report |
program_versions |
Captured tool version strings |
Some outputs are only generated when paired Illumina reads are provided or when annotation/AMR detection is enabled.
Contributions are welcome. Please:
- Fork the repository
- Create a feature branch
- Add or update code, workflows, or documentation
- Validate changes locally
- Submit a pull request
This project is licensed under the MIT License. See LICENSE for details.
For questions or issues, please open an issue on GitHub.