ChIPseq (pipelines in genomics for Chromatin Immunoprecipitation Sequencing) is an analysis pipeline for preprocessing, alignment and peak calling for ChIP sequencing experiments.The inputs are reads files from the sequencing experiment, and a configuration and meta data file that describes the experiment.
-Quality reads using fastqc
-Trimming of the read using cutadapt tools
-Mapping the reads to the genome using bowtie2
-Sorting and indexing using samtools
-Calling peaks of samples using MACS2
-QC reports
-Trimmed fastq files
-Sam files
-Sorted bam files
-narrowPeak/ BroadPeak files
- Git clone this repository
git clone https://github.com/SimranChhabria/ChIP-seq-snakemake-pipeline.git
- Load all the required modules
module load python
module load MACS2
module load snakemake
module load gbc-bowtie2
module load gbc-cutadapt
module load gbc-fastqc
module load gbc-samtools
- Activate the snakemake environment.
conda activate snakemake
-
Edit all the config files.
-
Ensure meta-data table contains all of the necessairy fields (TABBED-DELIMITED)
Example:meta-part1
#SampleName SampleFiles
CAL27_EHF_G2_S2_R1_001 CAL27_EHF_G2_S2_R1_001.fastq
Example:meta-part2
#SampleName InputFile SampleFiles Range
CAL27_EHF_G2_S2_R1_001 CAL_Input.bam CAL27_EHF_G2_S2_R1_001.bam --broad
CAL27_EHF_5A5_S1_R1_001 CAL_Input.bam CAL27_EHF_5A5_S1_R1_001.bam
** NOTE EXACT HEADERS HAVE TO BE ENFORCED or key errors will be thrown during processing**
- Launch jobs
snakemake --latency-wait 120 -pr -j 16 --use-conda -s Snakemake-part1 --cluster "sbatch --partition gbc --cluster faculty --qos gbc --account gbcstaff --mem=24G"