Skip to content

improvements rnaseq pipeline #2

@cokelaer

Description

@cokelaer

Future

oct 2021

  • use new sequana-wrappers

Those requested features are for the rnadiff analysis, not sequana_rnaseq:

  • if possible, provide resuls w/wo independent filtering
  • we using --force (rnadiff), we should suppress previous DGE results otherwise they will be added to the HTML reports
  • design add column 'alias name'

April/May/June 2021

  • if pvalue == 0, should set a value so that it can be seen in volcano plot
  • fastp tool to complement existing cutadapt trimming tool
  • add html entry point for the enrichment (if several comparisons) or several enrichments
  • refactorise sequana enrichment maybe to have syntax such as sequana enrichment panther"

march 2021

  • better filtering for multiqc
  • main summary.html should have more features/summary/plots
  • check rnaseqc gtf input [catch missing GTF in the main.py and rnaseq.rules]. added a converter in sequana
    • gtf input (from GFF) for the prokaryotes case
    • gtf input (from GFF) for the eukaryotes case
  • salmon for eukaryotes tested on mm10
  • check rnaseqc multiqc module . no need for the biomics fork anymore.

Jan 2021

  • BUG fix switch mark duplicates correctly for the qc and others
  • Better GFF handling with custom gff able to handle several feature types, sanity checks of user's choice on attribute and feature
  • Checked rna_sqc functionality and provide a gff2gtf parser in sequana.

Dec 2020

  • Fix issue of seg fault for bacterial genomes with star aligner
  • fastq_screen should work now. The only contaminants looked for is the phix. Other genome should be handled by the users (meaning build the indexing); fastq_screen searches for phix is now the default behaviour since the code should work out of the box
  • fix missing workflow image in the report.
  • add strandness plot in ./outputs directory and add the image in the summary plot
  • bowtie1/star/bowtie2 indexing are now stored in their own sub-directories
  • provide way to disable rRNA search
  • fix issue related to star index rule bug in sequana
  • rnadiff option is now set automatically to one_factor
  • add option --run to execute the pipeline without manual checking (batch mode)

Oct-Nov 2020

  • star index we may have warning.
    --genomeSAindexNbases 14 is too large for the genome size=4456448,
    which may cause seg-fault at the mapping step. Re-run genome generation with
    recommended --genomeSAindexNbases 10
  • a more generic title in the multiqc_config

Sept 2020

  • Add tolerance for feature_counts in the pipeline and config file after fixing sequana featurecounts functions (v0.9.17)

Aug 2020

  • do_indexing option is now pre-filled when instanciating the pipeline.
  • salmon option validateMappings is deprecated. to remove
  • salmon indexing included
  • refactorise the way feature counts are handled. Not in the onsuccess but a simpler code from @khourhin now included in sequana and this pipeline as of version 0.9.16 .

June/july 2020

  • Fix R1/R2 issue for rRNA
  • add mark duplicates in cluster config and set to False by default
  • add paired option for feature counts when paired data is provided.
  • add option to skip the fastqc on the raw data. This will be the default; The fastqc on the filtered data is kept by default.
  • cleanup the multiqc option to exclude fastqc_samples (to not clash with fastqc_filtered)

April-May 2020

  • if input genome size is >4billions Gb, the bowtie2 output extension are .bt2l (not .bt2) therefore, the sequana rule bowtie2_mapping should be updated and this pipeline as well.
  • add input to the rnadiff analysis in ./rnadiff
  • a faster --help option
  • a --from-project option to import existing pipeline
  • a HTML custom front page
  • add feature counts as a single file

Jan 2020 - April 2020

  • integrate the biomix scripts to make the link with the differential analysis
  • add feature counts in separate directory ready to use by rnadiff
  • integrate salmon

Dec 2019 - Jan 2020

  • fix the RNAseQC rule, which is brojen at the moment
  • check for rRNA feature name presence in the GFF
  • check for feature count type provide by the user
  • check config with schema
  • fix read tag
  • possiblity to switch off cutadapt
  • fixing the bowtie2 config/pipeline conflict name (see explanation of the naming convention in the config and pipeline when using bowtie2_mapping rule #3)
  • Fixing indexing issue: indexing is done even though not asked for or vice versa: when we set indexing to False, the pipeline fails with crypting message. We will provide a better handling of checking whether or not indexing is done.
  • include the schema file
  • parameter output-directory should be renamed output_directory in the multiqc section
  • handle the stdout correctly inb the fastqc rule, bowtie2, bowtie1
  • allow rRNA feature and/or files with meaningful error message if the 2 options conflict
  • better multiconfig report (text/title)

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions