-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hello! I'm extremely new to bioinformatics/python/github, so I apologize if this is a super easy fix! Thank you in advance for any help/input.
I am trying assign taxonomy to trnL (plant chloroplast) OTUs with blast results already run and output in the required format (outfmt '6 qseqid qlen sseqid pident length qstart qend sstart send evalue bitscore staxids'). I'll provide a file example of what my blast output looks like below in case that is the issue, but it's a tab delimited file with 10 top hits per OTU.
The code I use when running is:
python3 taxonomy_assignment_BLAST.py TLotus.fa ./ncbi_taxonomy/expanded_ncbi_taxonomy.tsv --blast_database IGNORE --blast_file TLotus_taxhits.txt
where TLotus.fa is my file with my OTU sequences (not using since I already have a blast file)
expanded_ncbi_taxonomy.tsv is taxonomy file, built as instructed (preview of what file looks like attached)
TLotus_taxhits.txt is my blast_output with custom formatting
The command runs without error, but the returned taxonomy is the default for when there is no match in the taxonomy file. For example, this is what every OTU looks like this for every OTU:
#BLAST LINE : Otu4 94 NC_047481.1 100 94 1 94 52771 52864 1.52E-39 174 94 354624
#BLAST LINE : Otu4 94 MN308055.1 100 94 1 94 52771 52864 1.52E-39 174 94 354624
#BLAST LINE : Otu4 94 MK105463.1 100 94 1 94 52769 52862 1.52E-39 174 94 3512
ASSIGNING TAXONOMY FOR Otu4 total hits passing initial filters = 3
NC_047481.1 100 --> CAPTURED after percent sway filter
MN308055.1 100 --> CAPTURED after percent sway filter
MK105463.1 100 --> CAPTURED after percent sway filter
Providing consensus taxonomy up to level 14 : tmp6
X100_1 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
X100_2 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
X100_3 superkingdom;subkingdom;sub_subkingdom;kingdom;tmp1;tmp2;phylum;class;family;genus;species;tmp3;tmp4;tmp5;tmp6
Taxonomy Assignment for Otu4 = superkingdom:subkingdom:sub_subkingdom:kingdom:tmp1:tmp2:phylum:class:family:genus:species:tmp3:tmp4:tmp5:tmp6
It looks like it's reading my blast file correctly, as it's pulling the accession number and percent ID correctly. When I search the taxonomy file for the taxids in the blast file, they are present with normal taxonomy information.
Is this some sort of basic formatting error? I've tried looking over the python code but have little to no knowledge of python coding and cannot find the issue myself!