Skip to content

Bowtie2 (bad greedy) and read multimapping for metagenomes #9

@TealFurnholm

Description

@TealFurnholm

Since this is designed for a meta-NGS data set - and Bowtie2 is not (he says so in his manual).

  • BT2 is a greedy matcher = very low %ID matches will still be reported, it was designed for a single eukaryote genome read alignment, with splicing and SNPs and optimized to find the first best hit
  • BT2 is incomprehensible in its manual to try and adjust to something similar to a %ID
  • 75% of all bacterial genes are orthologs - I curated the entire NCBI+JGI's 529 million genes, I know - and metagenomes are replete with many strains from the same species == you have to multimap the reads.

Instead of Bowtie2, I ran BBmap with 95% ID either with or without multimapping using MEC
(since I still haven't gotten DeepMased to work: see other reported issue)

  • no multimapping (random assign read to one of the best hits): #split_num 741
  • with read multimapping: #split_num 5322

You can see there is quite a difference - and I think you'll find the same with DeepMased.
Orthology/multimapping is a major issue. You may find quite a bit more than 1% chimeras!
Please trust me and check it out.

I plan to check results with MetaQuast to see which is correct, once I get DeepMAsED working.

The REAL question is what will your software do if I feed it a bam file with multimapped read?

Best,
Teal

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions