Sliding Primer In silico Detection of Encoded Regions (SPIDER)

SPIDER is a reference based in silico PCR based tool for detecting microbial sequences of interest from genome assemblies. It features two key functions. First, it will search for sequences of interest inside of your whole genome assembly. Second, it can extract those sequences of interest into a FASTA file for quickly analyzing genomic epidemiology.

Installation

Download or clone this GitHub repository
SPIDER uses a conda environment to handle dependencies. Install the conda environment from the provided environment.yml file using conda env create -f environment.yml
Activate the SPIDER conda environment using the command conda activate spider

SPIDER Search

To search for sequences of interest, SPIDER requires one or more query sequences and a database to search. The query sequences may be specified as either a single FASTA file, list of paths to multiple FASTA files or a folder containing multiple sequences (.fasta or .fna). You can either search a pre-compiled database using a keyword, or provide a custom database in FASTA format. The full list of parameters is available in a table below. If you want to just get going, see the example commands below.

Examples

Searching a whole-genome assembly for virulence factors in the VFDB belonging to Staphylococcus aureus:

python spider.py -f assembly.fasta -db vfdb -s "Staphylococcus aureus"

Search a list of genome assemblies for a sequence in the database custom_db.fasta and save the output to the file out.txt:

python spider.py -l genome_list.txt -db custom_db.fasta -o out.txt

Full SPIDER Search Parameters

Parameter	Description	Required
Input Options
-f, --fasta	Path to a single genome sequence	Yes, only one of these options at a time
-l, --list	Path to a list of genome sequences. This file is expected to contain paths to genome sequences, each on a newline.
-d, --directory	Path to a directory. SPIDER will look for any files that end in .fasta or .fna inside of this directory
-a, --annotation	Path to a GFF3 formatted annotation file. When included, SPIDER will compare detected amplicons to the annotations and check for overlap with any annotations. This feature only works with a single fasta input at a time.	No
Database Options
-db, --database	Either a keyword for a pre-compiled database, or path to a custom database in FASTA format.	Yes
--list_dbs	Provides a list of pre-compiled databases that can be searched. This is a stand-alone command that can be run without specifying a query and database.	No
-s, --search	This is a search term. If specified, the database will be filtered to FASTA headers that contain this term.	No
Output Options
-o, --output	Output file that will be generated. For SPIDER search, this will be a tab-separated-values file. If no output is specified, SPIDER will print to stdout.	No
Additional Search options
--overlaps	Checks if any of the identified sequences are overlapping one another. Default: False	No
--scan_codons	Searches for nearest start and stop codons to the start and end of identified amplicons and if they are in frame with one another. Default: False	No
-sl, --slide_limit	Percent length of a reference sequence that primers are allowed to slide. Default is 5 (5%).	No
-lt, --length	Percent length tolerance between an extracted amplicon and the reference sequence. Default is 20 (20%). This allows matches of 80-100% of the reference sequence.	No
-it, --identity	Percent identity tolerance between an extracted amplicon and the reference sequence. Anything above this threshold will be called positive. Default is 0 (0%).	No
-p, --primer_size	Length of primers for SPIDER to use. Default is 20 (20nt).	No

Database Shortcuts

SPIDER includes shortcuts to search common databases. To use a pre-compiled database, use its keyword in the -db argument. For example python spider.py -f assembly.fasta -db vfdb will search assembly.fasta for all virulence factors included in the Virulence Factor Database (VFDB). The -s/--search keywords can be used to filter the database for genes of interest. For example, searching for ExoU in VFDB can be performed with python spider.py -f assembly.fasta -db vfdb -s ExoU.

Database	Keyword	Citation
Virulence Factor Database (VFDB)	vfdb	Chen L, Zheng D, Liu B, Yang J, Jin Q. VFDB 2016: hierarchical and refined dataset for big data analysis--10 years on. Nucleic Acids Res. 2016;44(D1):D694-D697. doi:10.1093/nar/gkv1239

If there are additional existing databases you would like to see added to this tool, please open an issue on GitHub.

SPIDER Extract

SPIDER can quickly extract the sequences of amplicons identified by the program. Sequences will be extracted in the same orientation as your reference sequence, so sequences on opposite strands will be reverse complemented to display in the same direction. To run this command, it is requited that a SPIDER search is run first, and the output is saved to a file. SPIDER will parse the output of the search, and extract all sequences that were flagged as valid by the search.

Examples

Extract the nucleotide sequence of search using the custom database custom_db.fasta in the genome of assembly.fasta. Note the first line is a SPIDER search from the above section.

python spider.py -f assembly.fasta -c custom_db.fasta -o search_custom_db.txt
python spider.py -e search_custom_db.txt -o assembly_custom_db.fasta

Extract the amino acid sequences of the coagulase gene from all assemblies in the directory assemblies using the reference sequence from VFDB. Note the first line is a SPIDER search from the above section.

python spider.py -d assemblies -db vfdb -s "Staphylococcus aureus" -o coagulase_search.txt
python spider.py -e coagulase_search.txt --translate -o coagulase.fasta

Full SPIDER Extract Parameters

Parameter	Description	Required
-e, --extract	Output of a SPIDER search for sequence(s) of interest in tab-separated-values format. Note that SPIDER assumes that your sequences are still located in their original location when you performed the search.	Yes
-o, --output	Output file that will be generated. For SPIDER extract, this will be in FASTA format. If using the --separate option, this should be the name of a folder. Default: stdout	No
--translate	Translates the extracted nucleotide sequences to amino acid sequences. Note that this function assumes that the extracted sequence is in the desired reading frame.	No
--upstream	Number of nucleotides upstream of the desired amplicon to extract. Default: 0 (start of desired sequence)	No
--downstream	Number of nucleotides downstream of the desired amplicon to extract. Default: 0 (end of desired sequence)	No
--separate	Separate the output sequences into multiple FASTA files by target name. If using this option, the output flag is required and should be the name of a folder rather than a file. Default: False	No
--overwrite	Overwrite an existing output file. Default: False	No

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
helpers		helpers
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
spider.py		spider.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sliding Primer In silico Detection of Encoded Regions (SPIDER)

Installation

SPIDER Search

Examples

Full SPIDER Search Parameters

Database Shortcuts

SPIDER Extract

Examples

Full SPIDER Extract Parameters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sliding Primer In silico Detection of Encoded Regions (SPIDER)

Installation

SPIDER Search

Examples

Full SPIDER Search Parameters

Database Shortcuts

SPIDER Extract

Examples

Full SPIDER Extract Parameters

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages