Skip to content

Latest commit

 

History

History
162 lines (114 loc) · 5.67 KB

File metadata and controls

162 lines (114 loc) · 5.67 KB
Python 3.10 | 3.11 | 3.12 JOSS (journal of open source software) DOI

bioconvert — format conversion pipeline

Overview:Parallelise bioconvert conversions across a set of files
Input:Any file format supported by bioconvert (FastQ, BAM, FASTA, VCF, …)
Output:Converted files in the target format, MD5 checksums, and an HTML summary report
Status:Production
Citation:Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, doi:10.21105/joss.00352

Pipeline DAG

Installation

pip install sequana-bioconvert

To upgrade an existing installation:

pip install sequana-bioconvert --upgrade

Install all dependencies via conda/mamba:

mamba env create -f environment.yml

Quick Start

Step 1 — prepare the working directory

Convert all fastq.gz files in a directory to fasta.gz:

sequana_bioconvert \
    --input-directory /path/to/data \
    --input-ext fastq.gz \
    --output-ext fasta.gz \
    --command fastq2fasta

This creates a bioconvert/ working directory with config.yaml and a bioconvert.sh launch script.

Step 2 — run the pipeline:

cd bioconvert
sh bioconvert.sh

Results are written to the output/ subdirectory. An HTML summary report is generated on completion.

Usage

sequana_bioconvert --help

Key options:

  • --input-directory — directory containing the input files (required)
  • --input-ext — extension of input files, e.g. fastq.gz (required)
  • --output-ext — extension of output files, e.g. fasta.gz (required)
  • --command — bioconvert conversion command, e.g. fastq2fasta (required);
    run bioconvert --help for the full list
  • --input-pattern — prefix glob to restrict which files are picked up (default: *);
    e.g. sample_* to process only files starting with sample_
  • --method — override the default conversion method;
    run bioconvert COMMAND --show-methods to list valid methods

Usage with apptainer

All external tools are available through a pre-built apptainer image. To use it, add --use-apptainer when initialising the pipeline:

sequana_bioconvert \
    --input-directory /path/to/data \
    --input-ext fastq.gz \
    --output-ext fasta.gz \
    --command fastq2fasta \
    --use-apptainer \
    --apptainer-prefix ~/.sequana/apptainers

Then run as usual:

cd bioconvert
sh bioconvert.sh

Requirements

  • bioconvert ≥ 1.1.0 — the underlying conversion tool
  • graphviz — for pipeline DAG rendering (available via apptainer)

Install dependencies via conda/mamba:

mamba env create -f environment.yml

Rules and configuration details

The latest configuration file is available at: config.yaml

Each rule used in the pipeline has a corresponding section in config.yaml.

Changelog

Version Description
1.2.0
  • Update apptainer image to bioconvert 1.1.0
  • Switch to manager.get_shell() — no longer uses sequana_wrappers
  • Remove sequana_wrappers field from config and schema
  • Use importlib.metadata for version (fixes >=x.y.z display in HTML reports)
  • --input-pattern now optional (default *); combined with --input-ext to form the actual glob pattern
  • Add md5_output.txt alongside md5_input.txt
  • Improved HTML report: method display, bioconvert doc link, cleaner table labels
  • Early exit with clear error if no input files are found
  • Fix fragile sample name extraction for multi-dot filenames
1.1.0
  • Update apptainer image to bioconvert 1.1.0
  • CI: update to Python 3.10/3.11/3.12 and actions/checkout@v4
1.0.0 Uses bioconvert 1.0.0
0.10.0 Add container
0.9.0 Version using new sequana/sequana_pipetools framework
0.8.1 Working version
0.8.0 First release

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.