Skip to content

sequana/bioconvert

Repository files navigation

Python 3.10 | 3.11 | 3.12 JOSS (journal of open source software) DOI

bioconvert — format conversion pipeline

Overview:Parallelise bioconvert conversions across a set of files
Input:Any file format supported by bioconvert (FastQ, BAM, FASTA, VCF, …)
Output:Converted files in the target format, MD5 checksums, and an HTML summary report
Status:Production
Citation:Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, doi:10.21105/joss.00352

Pipeline DAG

Installation

pip install sequana-bioconvert

To upgrade an existing installation:

pip install sequana-bioconvert --upgrade

Install all dependencies via conda/mamba:

mamba env create -f environment.yml

Quick Start

Step 1 — prepare the working directory

Convert all fastq.gz files in a directory to fasta.gz:

sequana_bioconvert \
    --input-directory /path/to/data \
    --input-ext fastq.gz \
    --output-ext fasta.gz \
    --command fastq2fasta

This creates a bioconvert/ working directory with config.yaml and a bioconvert.sh launch script.

Step 2 — run the pipeline:

cd bioconvert
sh bioconvert.sh

Results are written to the output/ subdirectory. An HTML summary report is generated on completion.

Usage

sequana_bioconvert --help

Key options:

  • --input-directory — directory containing the input files (required)
  • --input-ext — extension of input files, e.g. fastq.gz (required)
  • --output-ext — extension of output files, e.g. fasta.gz (required)
  • --command — bioconvert conversion command, e.g. fastq2fasta (required);
    run bioconvert --help for the full list
  • --input-pattern — prefix glob to restrict which files are picked up (default: *);
    e.g. sample_* to process only files starting with sample_
  • --method — override the default conversion method;
    run bioconvert COMMAND --show-methods to list valid methods

Usage with apptainer

All external tools are available through a pre-built apptainer image. To use it, add --use-apptainer when initialising the pipeline:

sequana_bioconvert \
    --input-directory /path/to/data \
    --input-ext fastq.gz \
    --output-ext fasta.gz \
    --command fastq2fasta \
    --use-apptainer \
    --apptainer-prefix ~/.sequana/apptainers

Then run as usual:

cd bioconvert
sh bioconvert.sh

Requirements

  • bioconvert ≥ 1.1.0 — the underlying conversion tool
  • graphviz — for pipeline DAG rendering (available via apptainer)

Install dependencies via conda/mamba:

mamba env create -f environment.yml

Rules and configuration details

The latest configuration file is available at: config.yaml

Each rule used in the pipeline has a corresponding section in config.yaml.

Changelog

Version Description
1.2.0
  • Update apptainer image to bioconvert 1.1.0
  • Switch to manager.get_shell() — no longer uses sequana_wrappers
  • Remove sequana_wrappers field from config and schema
  • Use importlib.metadata for version (fixes >=x.y.z display in HTML reports)
  • --input-pattern now optional (default *); combined with --input-ext to form the actual glob pattern
  • Add md5_output.txt alongside md5_input.txt
  • Improved HTML report: method display, bioconvert doc link, cleaner table labels
  • Early exit with clear error if no input files are found
  • Fix fragile sample name extraction for multi-dot filenames
1.1.0
  • Update apptainer image to bioconvert 1.1.0
  • CI: update to Python 3.10/3.11/3.12 and actions/checkout@v4
1.0.0 Uses bioconvert 1.0.0
0.10.0 Add container
0.9.0 Version using new sequana/sequana_pipetools framework
0.8.1 Working version
0.8.0 First release

Contribute & Code of Conduct

To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

About

convert files from one format to another using bioconvert

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages