Skip to content

Latest commit

 

History

History
84 lines (57 loc) · 3.75 KB

File metadata and controls

84 lines (57 loc) · 3.75 KB

Finding Rx in Potato - an example workflow

Example output files are provided in the example_outputs directory. These can be compared to the files generated by this workflow (eg. by checksums) to confirm correct functionality.

Download required files

All files, except for the reference DM potato genome are available in this repository, simply clone down the repository to your local machine. All commands in this workflow will assume you are based in this repository, you may need to change commands if you change the structure.

Ensure you have followed the instructions in README.md to install required software and optionally set up a cluster profile.

# Fetch the DM genome, this is too large to include here
cd example_inputs/agrenseq
wget http://spuddb.uga.edu/data/PGSC_DM_v4.03_pseudomolecules.fasta.zip
unzip PGSC_DM_v4.03_pseudomolecules.fasta.zip

Obtain reads

Reads have been submitted to ENA BioProjects PRJEB56823 and PRJEB56825. In this example we will use the GNU parallel utility to download reads. Replace X in the parallel command with the number of jobs to run in parallel, this will depend on your system, based on CPU availability and networking bandwidth. If you prefer another method you may use this, though you may need to change file paths in later files.

mkdir -p ../Reads
cd ../Reads
parallel -j X wget < ../reads.txt

Perform SMRT-RenSeq assembly

Depending on your system, you may be able to wrap these commands into a job with eg. sbatch. In case you are not able to capture the stdout and stderr from the core snakemake process, log files will be written to .snakemake/log/*.snakemake.log

# Copy files to SMRT-RenSeq assembly directory
cd ../../smrtrenseq_assembly
cp ../example_inputs/smrtrenseq_assembly/* config/.

# Run workflow
# If using a cluster profile
snakemake --profile /path/to/cluster/profile

# If running locally
snakemake --use-conda --cores=/number/of/CPU/cores

Perform AgRenSeq

# Copy files to AgRenSeq directory
cd ../agrenseq
cp ../example_inputs/agrenseq/* config/.

# Run workflow
# If using a cluster profile
snakemake --profile /path/to/cluster/profile

# If running locally
snakemake --use-conda --cores=/number/of/CPU/cores

Perform dRenSeq

# Prepare files from output of SMRT-RenSeq assembly and AgRenSeq

cd ../drenseq

# Copy files to dRenSeq directory
cp ../example_inputs/drenseq/* config/.

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' < ../smrtrenseq_assembly/assembly/Gemson/Gemson.contigs.fasta | tail -n +2 > ../smrtrenseq_assembly/assembly/Gemson/Gemson_unwrapped.contigs.fasta # unwrap fasta file so all the sequence is on one line

cat ../smrtrenseq_assembly/assembly/Gemson/Gemson_unwrapped.contigs.fasta | grep -A1 -f ../agrenseq/results/Gemson_filtered_contigs.txt | sed 's/--//g' | sed '/^$/d' | sed '/^>/ s/ .*//' >> config/Gemson_candidates.fa # get your sequences for contigs you want, we have provided example sequences to aid in running the analysis

cat ../smrtrenseq_assembly/NLR_Annotator/Gemson_NLR_Annotator_sorted.bed | grep -f ../agrenseq/results/Gemson_filtered_contigs.txt | cut -f1-4 >> config/Gemson_candidates.bed # Make a bed file, we have provided example sequences to aid in running the analysis

# Run workflow
# If using a cluster profile
snakemake --profile /path/to/cluster/profile

# If running locally
snakemake --use-conda --cores=/number/of/CPU/cores

This should show tig00001343_nlr_1 as the only candidate with 100% coverage in all samples scored as 1 in the read scores file used for AgRenSeq. The nucleotide sequence for this can be found in the NLR Annotator output fasta in the SMRT-RenSeq assembly directory. When BLASTed against the NCBI nr/nt database, the top hit is the reference Rx sequence.