improvements rnaseq pipeline

### Future
- [ ] Duplication statistics: high coverage or PCR duplicates ? Spread over the transcriptome or localized on a set of genes. How distributed at the gene scale ?
- [ ] Add a column with list of genes corresponding to each GO term enriched (as present for KEGG)
- [ ] lncRNA analysis
https://www.tandfonline.com/doi/full/10.1080/15476286.2021.1899673
- [ ] CircRNA analysis 
https://www.sciencedirect.com/science/article/pii/S1672022921000292
- [ ] tRNA abundance/modifcation 
https://www.sciencedirect.com/science/article/pii/S1097276521000484?via%3Dihub 
- [ ] Gene fusion detection
https://genome.cshlp.org/content/31/3/448.short?rss=1
- [ ] WGCNA and meta analysis
https://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1008976&utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ploscompbiol%2FNewArticles+%28PLOS+Computational+Biology+-+New+Articles%29
- [ ] Include String ?
https://string-db.org/

### oct 2021

- [x] use new sequana-wrappers

# Those requested features are for the rnadiff analysis, not sequana_rnaseq:

-  if possible, provide resuls w/wo independent filtering
-  we using --force (rnadiff), we should suppress previous DGE results otherwise they will be added to the HTML reports
 - [x] design add column 'alias name'


### April/May/June 2021
- [x] if pvalue == 0, should set a value so that it can be seen in volcano plot
- [x] fastp tool to complement existing cutadapt trimming tool
- add html entry point for the enrichment (if several comparisons) or several enrichments
- refactorise sequana enrichment maybe to have syntax such as   sequana enrichment panther"

## march 2021

- [x] better filtering for multiqc
- [x] main summary.html should have more features/summary/plots
- [x] check rnaseqc gtf input [catch missing GTF in the main.py and rnaseq.rules]. added a converter in sequana
   - [x] gtf input (from GFF) for the prokaryotes case
   - [x] gtf input (from GFF) for the  eukaryotes case
- [x] salmon for eukaryotes tested on mm10
- [x] check rnaseqc multiqc module . no need for the biomics fork anymore.


### Jan 2021

- [x] BUG fix switch mark duplicates correctly for the qc and others
- [x] Better GFF handling with custom gff able to handle several feature types, sanity checks of user's choice on attribute and feature
- [x] Checked rna_sqc functionality and provide a gff2gtf parser in sequana. 

### Dec 2020
- [x] Fix issue of seg fault for bacterial genomes with star aligner
- [x] fastq_screen should work now. The only contaminants looked for is the phix. Other genome should be handled by the users (meaning build the indexing); fastq_screen searches for phix is now the default behaviour since the code should work out of the box
- [x] fix missing workflow image in the report.
- [x] add strandness plot in ./outputs directory and add the image in the summary plot
- [x] bowtie1/star/bowtie2 indexing are now stored in their own sub-directories
- [x] provide way to disable rRNA search 
- [x] fix issue related to star index rule bug in sequana
- [x] rnadiff option is now set automatically to one_factor
- [x] add option --run to execute the pipeline without manual checking (batch mode)

## Oct-Nov 2020

- [x] star index we may have warning. 
     --genomeSAindexNbases 14 is too large for the genome size=4456448, 
        which may cause seg-fault at the mapping step. Re-run genome generation with 
        recommended --genomeSAindexNbases 10	
- [x] a more generic title in the multiqc_config

### Sept 2020

- [x] Add tolerance for feature_counts in the pipeline and config file after fixing sequana featurecounts functions (v0.9.17)

### Aug 2020
- [x]  do_indexing option is now pre-filled when instanciating the pipeline. 
- [x] salmon option validateMappings is deprecated. to remove
- [x] salmon indexing included
- [x] refactorise the way feature counts are handled. Not in the onsuccess but a simpler code from @khourhin now included in sequana and this pipeline as of version 0.9.16 .

### June/july 2020
- [x] Fix R1/R2 issue for rRNA
- [x] add mark duplicates in cluster config and set to False by default
- [x] add paired option for feature counts when paired data is provided.
- [x] add option to skip the fastqc on the raw data. This will be the default; The fastqc on the filtered data is kept by default.
- [x] cleanup the multiqc option to exclude fastqc_samples (to not clash with fastqc_filtered)

### April-May 2020
- [x] if input genome size is >4billions Gb, the bowtie2 output extension are .bt2l (not .bt2) therefore, the sequana rule bowtie2_mapping should be updated and this pipeline as well. 
- [x] add input to the rnadiff analysis in ./rnadiff
- [x] a faster --help option
- [x] a --from-project option to import existing pipeline
- [x] a HTML custom front page
- [x] add feature counts as a single file


### Jan 2020 - April 2020

- [x] integrate the biomix scripts to make the link with the differential analysis
- [x] add feature counts in separate directory ready to use by rnadiff
- [x] integrate salmon

### Dec 2019 - Jan 2020

- [x] fix the RNAseQC rule, which is brojen at the moment
- [x] check for rRNA feature name presence in the GFF 
- [x] check for feature count type provide by the user
- [x] check config with schema
- [x] fix read tag
- [x] possiblity to switch off cutadapt
- [x] fixing the bowtie2 config/pipeline conflict name (see  #3)
- [x] Fixing indexing issue: indexing is done even though not asked for or vice versa: when we set indexing to False, the pipeline fails with crypting message. We will provide a better handling of checking whether or not indexing is done.
- [x] include the schema file
- [x] parameter output-directory should be renamed output_directory in the multiqc section
- [x] handle the stdout correctly inb the fastqc rule, bowtie2, bowtie1
- [x] allow rRNA feature and/or files with meaningful error message if the 2 options conflict
- [x] better multiconfig report (text/title)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improvements rnaseq pipeline #2

Future

oct 2021

Those requested features are for the rnadiff analysis, not sequana_rnaseq:

April/May/June 2021

march 2021

Jan 2021

Dec 2020

Oct-Nov 2020

Sept 2020

Aug 2020

June/july 2020

April-May 2020

Jan 2020 - April 2020

Dec 2019 - Jan 2020

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

improvements rnaseq pipeline #2

Description

Future

oct 2021

Those requested features are for the rnadiff analysis, not sequana_rnaseq:

April/May/June 2021

march 2021

Jan 2021

Dec 2020

Oct-Nov 2020

Sept 2020

Aug 2020

June/july 2020

April-May 2020

Jan 2020 - April 2020

Dec 2019 - Jan 2020

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions