Skip to content

SumitTarafder/Figbird

Repository files navigation

Figbird

Filling Gaps by Iterative Read Distribution

  • A software developed in C++ for filling gaps in draft genome assemblies using second generation Illumina sequencing reads.
  • Supports read pairs of both smaller inserts(~200 bp) and larger inserts (~3500 bp).
  • Utilizes probabilistic methods instead of graph based methods based on insert size information of read pairs.
  • Makes maximum use of available sequence information by using both partially aligned reads as well as unmapped reads.

Dependencies

The software can run on Linux and Mac systems with a few dependencies listed below:

  • Bowtie2: Used for mapping read pairs to gapped scaffolds. The default bowtie2 version used and provided within the software is 2.2.3(Linux). But users can download their preferred version from https://github.com/BenLangmead/bowtie2. If you want to use the version of bowtie inside the software, then give the following commands:
    unzip bowtie2-2.2.3-source.zip
    cd bowtie2-2.2.3/
    make
    
  • The software is developed in C++ and requires GNU g++(version 4.8 or greater) to compile the codes and the driver script is written in bash and requires GNU bash (version 4.3 or greater)
  • GNU uitlity 'bc' [basic calculator]. If you don't have bc in your system, run the following command:
sudo apt install bc
  • A command line JSON processor library 'jq'. You can install jq from the following github page https://stedolan.github.io/jq/
  • [Optional] Python is required only if you want to assess the quality of filled gaps using QUAST software. The exact version of QUAST along with necessary correction files as depicted in paper is already attached with the software. Unzip the folder before using it. There is no need to install QUAST.

Input configuration

Figbird uses a configuration file in JSON format to take scaffold and read library information. A sample configuration file named "Config.json" is provided in the installation folder which must be editted accordingly. Users can also use this website (http://jsonlint.com/) to check the validity of their input config file. Following is the list of required information specified in the JSON file with explanations:

  • Draft_genome: Path to the gapped draft genome to fill.

  • Bowtie2: Path to the bowtie2 executables. If you use the bowtie2 version inside the folder, then put "bowtie2-2.2.3", otherwise if it is in system path, then put “” here. In case you want to use your preferred version, put that path of installed directory in this input.

  • Output_Folder: Path to the directory where all the outputs will be stored. A folder named Figbird will be created in that directory where all the output will be stored.

  • Reference_Genome[optional]: Only needed if you want to evaluate the quality of the filled assembly using QUAST.

  • Read_Pairs: Input your paired read libraries one by one along with all the following information:

    1. path_1: Path to first of the read pair files.
    2. path_2: Path to second of the read pair files.
    3. avg_insert_size: Average insert size of the read pair library(<5000).
    4. is_reverse: If your read pair files are already in forward-reverse(FR) orientation then put 0, otherwise put 1. In case a 1 is given, we will reverse complement both the files of the the input read pair.
    5. max_read_len: Maximum read length of the library(<=200).
    6. serial_num: The order of reads usage for filling gaps.
    7. num_itr_partial: We will use both one end partially aligned and one end unmapped reads for each read pair for gap filling purpose. Enter the itration count for partial approach here.
    8. num_itr_unmapped: Enter the itration count for unmapped approach here.
    9. order: Put the order for Which one between partial and unmapped method will be applied first.
    • [Users must input atleast one library of read pair files and all 9 required information per library to start gap filling]
  • Parameters:

    1. numthreads: Number of threads used during bowtie2 alignment and gap filling procedure.[Default:4]
    2. evaluation: Put 1 if you want to assess with QUAST or 0 otherwise.[Default:0]
    3. gaplen_negative_overlap: We have allowed negative gap lengths in our method i.e a gap can be diminished if the corresponding left and right flank has an overlap with supporting verification of aligned reads. Enter the maximum length of the gaps for which this method will be applicable.[Default: 30]
    4. default: If you want to manually input the order of the reads usage along with their number of iterations, put 0. Otherwise, put 1 for default approach. If you put 1, then information [6-9] for read pairs won't be needed to specify.[Default:1]
    5. trim_len: The amount of nucleotides being chopped off from either side of the gapped regions as this is the stopping point for the assemblers and highly likely to contain erroneous sequence.[Deafult:10]
    6. set_inputmean: It can be set to either 0 or 1. Users can set this parameter to 1 to set the minimum scaffold length equal to the “avg_insert_size” of the read library to reduce bias towards shorter insert sizes during alignment for learning distributions. Otherwise, set it to 0 for no limits.[Deafult:0]

Running Figbird

Download the folder https://github.com/SumitTarafder/Figbird. Users can directly run the tool if all the dependencies are installed beforehand. To run Figbird

tar xzf Figbird.tar.gz
cd Figbird
chmod a+x RunFigbird.sh && ./RunFigbird.sh Config.json

Output

Review the "Manual.pdf" inside the folder to find out details format of the output.

Citations

  • Sumit Tarafder, Mazharul Islam, Swakkhar Shatabda, Atif Rahman, Figbird: a probabilistic method for filling gaps in genome assemblies, Bioinformatics, Volume 38, Issue 15, 1 August 2022, Pages 3717–3724, https://doi.org/10.1093/bioinformatics/btac404

About

A software for gap filling in genome assemblies

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors