| Overview: | Parallelise bioconvert conversions across a set of files |
|---|---|
| Input: | Any file format supported by bioconvert (FastQ, BAM, FASTA, VCF, …) |
| Output: | Converted files in the target format, MD5 checksums, and an HTML summary report |
| Status: | Production |
| Citation: | Cokelaer et al, (2017), 'Sequana': a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, doi:10.21105/joss.00352 |
pip install sequana-bioconvert
To upgrade an existing installation:
pip install sequana-bioconvert --upgrade
Install all dependencies via conda/mamba:
mamba env create -f environment.yml
Step 1 — prepare the working directory
Convert all fastq.gz files in a directory to fasta.gz:
sequana_bioconvert \
--input-directory /path/to/data \
--input-ext fastq.gz \
--output-ext fasta.gz \
--command fastq2fasta
This creates a bioconvert/ working directory with config.yaml and a
bioconvert.sh launch script.
Step 2 — run the pipeline:
cd bioconvert sh bioconvert.sh
Results are written to the output/ subdirectory. An HTML summary report is
generated on completion.
sequana_bioconvert --help
Key options:
--input-directory— directory containing the input files (required)--input-ext— extension of input files, e.g.fastq.gz(required)--output-ext— extension of output files, e.g.fasta.gz(required)--command— bioconvert conversion command, e.g.fastq2fasta(required);- run
bioconvert --helpfor the full list
--input-pattern— prefix glob to restrict which files are picked up (default:*);- e.g.
sample_*to process only files starting withsample_
--method— override the default conversion method;- run
bioconvert COMMAND --show-methodsto list valid methods
All external tools are available through a pre-built apptainer image. To use
it, add --use-apptainer when initialising the pipeline:
sequana_bioconvert \
--input-directory /path/to/data \
--input-ext fastq.gz \
--output-ext fasta.gz \
--command fastq2fasta \
--use-apptainer \
--apptainer-prefix ~/.sequana/apptainers
Then run as usual:
cd bioconvert sh bioconvert.sh
- bioconvert ≥ 1.1.0 — the underlying conversion tool
- graphviz — for pipeline DAG rendering (available via apptainer)
Install dependencies via conda/mamba:
mamba env create -f environment.yml
The latest configuration file is available at: config.yaml
Each rule used in the pipeline has a corresponding section in config.yaml.
| Version | Description |
|---|---|
| 1.2.0 |
|
| 1.1.0 |
|
| 1.0.0 | Uses bioconvert 1.0.0 |
| 0.10.0 | Add container |
| 0.9.0 | Version using new sequana/sequana_pipetools framework |
| 0.8.1 | Working version |
| 0.8.0 | First release |
To contribute to this project, please take a look at the Contributing Guidelines first. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
