Example output files are provided in the example_outputs directory. These can be compared to the files generated by this workflow (eg. by checksums) to confirm correct functionality.
All files, except for the reference DM potato genome are available in this repository, simply clone down the repository to your local machine. All commands in this workflow will assume you are based in this repository, you may need to change commands if you change the structure.
Ensure you have followed the instructions in README.md to install required software and optionally set up a cluster profile.
# Fetch the DM genome, this is too large to include here
cd example_inputs/agrenseq
wget http://spuddb.uga.edu/data/PGSC_DM_v4.03_pseudomolecules.fasta.zip
unzip PGSC_DM_v4.03_pseudomolecules.fasta.zipReads have been submitted to ENA BioProjects PRJEB56823 and PRJEB56825. In this example we will use the GNU parallel utility to download reads. Replace X in the parallel command with the number of jobs to run in parallel, this will depend on your system, based on CPU availability and networking bandwidth. If you prefer another method you may use this, though you may need to change file paths in later files.
mkdir -p ../Reads
cd ../Reads
parallel -j X wget < ../reads.txtDepending on your system, you may be able to wrap these commands into a job with eg. sbatch. In case you are not able to capture the stdout and stderr from the core snakemake process, log files will be written to .snakemake/log/*.snakemake.log
# Copy files to SMRT-RenSeq assembly directory
cd ../../smrtrenseq_assembly
cp ../example_inputs/smrtrenseq_assembly/* config/.
# Run workflow
# If using a cluster profile
snakemake --profile /path/to/cluster/profile
# If running locally
snakemake --use-conda --cores=/number/of/CPU/cores# Copy files to AgRenSeq directory
cd ../agrenseq
cp ../example_inputs/agrenseq/* config/.
# Run workflow
# If using a cluster profile
snakemake --profile /path/to/cluster/profile
# If running locally
snakemake --use-conda --cores=/number/of/CPU/cores# Prepare files from output of SMRT-RenSeq assembly and AgRenSeq
cd ../drenseq
# Copy files to dRenSeq directory
cp ../example_inputs/drenseq/* config/.
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < ../smrtrenseq_assembly/assembly/Gemson/Gemson.contigs.fasta | tail -n +2 > ../smrtrenseq_assembly/assembly/Gemson/Gemson_unwrapped.contigs.fasta # unwrap fasta file so all the sequence is on one line
cat ../smrtrenseq_assembly/assembly/Gemson/Gemson_unwrapped.contigs.fasta | grep -A1 -f ../agrenseq/results/Gemson_filtered_contigs.txt | sed 's/--//g' | sed '/^$/d' | sed '/^>/ s/ .*//' >> config/Gemson_candidates.fa # get your sequences for contigs you want, we have provided example sequences to aid in running the analysis
cat ../smrtrenseq_assembly/NLR_Annotator/Gemson_NLR_Annotator_sorted.bed | grep -f ../agrenseq/results/Gemson_filtered_contigs.txt | cut -f1-4 >> config/Gemson_candidates.bed # Make a bed file, we have provided example sequences to aid in running the analysis
# Run workflow
# If using a cluster profile
snakemake --profile /path/to/cluster/profile
# If running locally
snakemake --use-conda --cores=/number/of/CPU/coresThis should show tig00001343_nlr_1 as the only candidate with 100% coverage in all samples scored as 1 in the read scores file used for AgRenSeq. The nucleotide sequence for this can be found in the NLR Annotator output fasta in the SMRT-RenSeq assembly directory. When BLASTed against the NCBI nr/nt database, the top hit is the reference Rx sequence.