HAPHiR: Hybrid Assembly of PacBio HiFi and Illumina Reads

This repo is under development!

HAPHiR performs high‑quality bacterial genome assembly using PacBio HiFi long reads and Illumina short reads, combining accuracy, robustness, and efficient cloud execution.

The workflow runs multiple long‑read assemblers in parallel (Flye, Hifiasm, Raven, wtdbg2) and generates a unified, high‑confidence consensus assembly using Autocycler. Small circular plasmids are recovered through a dedicated hybrid assembly step using Plassembler, ensuring both chromosomal and plasmid components are accurately reconstructed.

HAPHiR is designed for cloud‑native execution on Terra, but can also be run locally using WDL executer such as miniwdl or Cromwell.

Features

Multi-assembler consensus: Runs 4 independent long-read assemblers (Flye, Hifiasm, Wtdbg2, Raven) and combines them using Autocycler for enhanced accuracy
HiFI only support: Works with PacBio HiFi-only data or hybrid HiFi + Illumina data
Plasmid recovery: Dedicated plasmid assembly and recovery using Plassembler
Flexible inputs: Accepts PacBio BAM or FASTQ files, automatically converts as needed
Quality control: Includes read trimming, genome size estimation, and coverage normalization
Polishing: Short-read polishing with Polypolish
Annotation: Optional standardized annotation with Bakta
Antimicrobial resistance detection: Optional AMR analysis with AmrFinderPlus
Cloud-ready: Designed for scalable execution on Terra
Containerized: All tools run in Docker containers for reproducibility

Quick Start

Prerequisites

Docker or another supported container runtime
miniwdl for local workflow execution
Java 8+ for Cromwell/WOMtool validation
Optional: Terra account for cloud execution

Single Sample Assembly

Use the single-sample workflow workflows/wf_haphir.wdl:

miniwdl run workflows/wf_haphir.wdl \
  id=sample1 \
  long_fq=sample1.hifi.fastq.gz \
  short_fq1=sample1.R1.fastq.gz \
  short_fq2=sample1.R2.fastq.gz

If long_fq is a BAM file, HAPHiR will automatically convert it to FASTQ before assembly.

Batch Processing

Use workflows/wf_haphir_batch.wdl with a TSV sample sheet.

Example samples.tsv formats:

# HiFi-only samples
sample1	/path/to/sample1.hifi.fastq.gz
sample2	/path/to/sample2.hifi.fastq.gz

# Hybrid samples
sample3	/path/to/sample3.hifi.fastq.gz	/path/to/sample3.R1.fastq.gz	/path/to/sample3.R2.fastq.gz

Run the batch workflow:

miniwdl run workflows/wf_haphir_batch.wdl samplesheet=samples.tsv

Input Files

Single Sample Workflow (`workflows/wf_haphir.wdl`)

Input	Type	Description
`id`	String	Sample identifier
`long_fq`	File	PacBio HiFi reads (FASTQ or BAM)
`short_fq1`	File?	Illumina forward reads (optional)
`short_fq2`	File?	Illumina reverse reads (optional)
`organism`	String?	Taxonomic name used for annotation (optional)
`bakta_annotation`	Boolean	Run Bakta annotation (default: false)
`amrfinder`	Boolean	Run AmrFinderPlus AMR detection (default: true)

Batch Workflow (`workflows/wf_haphir_batch.wdl`)

samplesheet — a TSV file parsed by read_tsv(samplesheet)
Each row may contain either 2 columns (id, long_fq) or 4 columns (id, long_fq, short_fq1, short_fq2)

Pipeline Overview

Convert input BAM to FASTQ if needed using pbtk
Estimate genome size with lrge
Downsample reads with rasusa for consistent long-read coverage
Run Flye, Hifiasm, Wtdbg2, and Raven in parallel
Generate a consensus assembly using Autocycler
Recover plasmids with Plassembler when paired-end Illumina reads are provided
Map plasmids to long read consensus with Minimap2, filter and merge plasmids.
Polish with Polypolish when short reads are available
Reorient the final assembly with dnaapler
Create assembly visualizations with Bandage
Optionally run Bakta annotation and AmrFinderPlus AMR detection

Output Files

Primary outputs exposed by the workflow:

Output	Description
`final_assembly`	Final reoriented consensus FASTA
`dnaapler_summary`	Dnaapler orientation report
`autocycler_assembly`	Consensus assembly FASTA from Autocycler
`autocycler_graph`	Autocycler assembly graph
`asm_viz`	Assembly comparison and visualization
`fastp_report`	Fastp trimming report (when paired reads are provided)
`plassembler_plasmids`	Recovered plasmid FASTA
`plassembler_graph`	Plassembler assembly graph
`plassembler_summary`	Plassembler summary report
`minimap2_report`	Minimap2 overlap report
`merge_summary`	Assembly merge decisions summary
`bakta_outputs`	Bakta annotation outputs
`amrfinder_report`	AmrFinderPlus report
`program_versions`	Captured tool version strings

Some outputs are only generated when paired Illumina reads are provided or when annotation/AMR detection is enabled.

Contributing

Contributions are welcome. Please:

Fork the repository
Create a feature branch
Add or update code, workflows, or documentation
Validate changes locally
Submit a pull request

License

This project is licensed under the MIT License. See LICENSE for details.

Support

For questions or issues, please open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
assets		assets
tasks		tasks
workflows		workflows
.dockstore.yml		.dockstore.yml
.prettierrc.yml		.prettierrc.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAPHiR: Hybrid Assembly of PacBio HiFi and Illumina Reads

Features

Quick Start

Prerequisites

Single Sample Assembly

Batch Processing

Input Files

Single Sample Workflow (`workflows/wf_haphir.wdl`)

Batch Workflow (`workflows/wf_haphir_batch.wdl`)

Pipeline Overview

Output Files

Contributing

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HAPHiR: Hybrid Assembly of PacBio HiFi and Illumina Reads

Features

Quick Start

Prerequisites

Single Sample Assembly

Batch Processing

Input Files

Single Sample Workflow (workflows/wf_haphir.wdl)

Batch Workflow (workflows/wf_haphir_batch.wdl)

Pipeline Overview

Output Files

Contributing

License

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Single Sample Workflow (`workflows/wf_haphir.wdl`)

Batch Workflow (`workflows/wf_haphir_batch.wdl`)

Packages