ChIP-Seq pipeline
Here we provide the tools to perform paired end or single read ChIP-Seq analysis including raw data quality control, read mapping, peak calling, differential binding analysis and functional annotation. As input files you may use either zipped fastq-files (.fastq.gz) or mapped read data (.bam files). In case of paired end reads, corresponding fastq files should be named using .R1.fastq.gz and .R2.fastq.gz suffixes.
Pipeline Workflow
All analysis steps are illustrated in the pipeline flowchart . Specify the desired analysis details for your data in the essential.vars.groovy file (see below) and run the pipeline chipseq.pipeline.groovy as described here . A markdown file ChIPreport.Rmd will be generated in the output reports folder after running the pipeline. Subsequently, the ChIPreport.Rmd file can be converted to a final html report using the knitr R-package.
The pipelines includes
- raw data quality control with FastQC, BamQC and MultiQC
- mapping reads or read pairs to the reference genome using bowtie2 (default) or bowtie1
- filter out multimapping reads from bowtie2 output with samtools (optional)
- identify and remove duplicate reads with Picard MarkDuplicates (optional)
- generation of bigWig tracks for visualisation of alignment with deeptools bamCoverage. For single end design, reads are extended to the average fragment size
- characterization of insert size using Picard CollectInsertSizeMetrics (for paired end libraries only)
- characterize library complexity by PCR Bottleneck Coefficient using the GenomicAlignments R-package (for single read libraries only)
- characterize phantom peaks by cross correlation analysis using the spp R-package (for single read libraries only)
- peak calling of IP samples vs. corresponding input controls using MACS2
- peak annotation using the ChIPseeker R-package (optional)
- differential binding analysis using the diffbind R-package (optional). For this, input peak files must be given in NGSpipe2go/tools/diffbind/targets_diffbind.txt and contrasts of interest in NGSpipe2go/tools/diffbind/contrasts_diffbind.txt (see below)
Code Snippets
24 25 26 27 28 29 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/BlackList_Filter/BlackList_Filter.R $BLACKLIST_FILTER_FLAGS; ""","blacklist_filter" |
37 38 39 40 41 42 | exec """ ${TOOL_ENV} && ${PREAMBLE} && bowtie $BOWTIE_FLAGS ${bowtie1_vars.ref} $BOWTIE_INPUT | samtools view $SAMTOOLS_VIEW_FLAGS - | samtools sort $SAMTOOLS_SORT_FLAGS -T \${TMP}/\$(basename $output.prefix)_bowtie1_sort - > $output ""","bowtie1" |
37 38 39 40 41 42 | exec """ ${TOOL_ENV} && ${PREAMBLE} && bowtie2 $BOWTIE2_FLAGS $BOWTIE2_INPUT | samtools view $SAMTOOLS_VIEW_FLAGS - | samtools sort $SAMTOOLS_SORT_FLAGS -T \${TMP}/\$(basename $output.prefix) - > $output; ""","bowtie2" |
41 42 43 44 45 46 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/diffbind/diffbind2.R $DIFFBIND_FLAGS ""","diffbind2" |
47 48 49 50 51 52 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/diffbind/diffbind3.R $DIFFBIND_FLAGS ""","diffbind3" |
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | exec """ ${TOOL_ENV} && ${PREAMBLE} && samtools view $FILBOWTIE2_FLAGS -bhu ${input} | samtools sort -@ $filbowtie2unique_vars.samtools_threads -T \${TMP}/\$(basename $output.prefix) -o ${input.prefix}_mmremoved.bam - && if [[ "${dupremoval_vars.remove_pcr_dups}" == "true" ]]; then echo "Removing PCR duplicates"; bam dedup --in ${input.prefix}_mmremoved.bam --out ${output} --log ${LOGS}/filbowtie2unique/\$(basename ${output.prefix})_dupmetrics.log --noPhoneHome --force --rmDups; rm ${input.prefix}_mmremoved.bam; else echo "Not removing the PCR duplicates"; mv ${input.prefix}_mmremoved.bam ${output}; fi; ""","filbowtie2unique" |
26 27 28 29 30 31 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/GO_Enrichment/GREAT.R $GREAT_FLAGS ""","GREAT" |
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | exec """ ${TOOL_ENV} && ${PREAMBLE} && touch $output; if [ ! -e ${ipstrength_vars.targets} ]; then echo "Targets file ${ipstrength_vars.targets} doesn't exist" >> $output && exit 0; fi; BAM=\$(basename $input) && extension="\${BAM#*.}" && BAM="\${BAM%%.*}" && grep "^\$BAM" ${ipstrength_vars.targets} | while read -r TARGET; do IP=\$( echo $TARGET | tr '\t' ' ' | cut -f1 -d" ").\$extension && IPname=\$( echo $TARGET | tr '\t' ' ' | cut -f2 -d" ") && INPUT=\$( echo $TARGET | tr '\t' ' ' | cut -f3 -d" ").\$extension && INPUTname=\$(echo $TARGET | tr '\t' ' ' | cut -f4 -d" "); if [ "\$BAM" != "\$INPUT" ]; then if [ "\$INPUT" != "none.\$extension" ]; then echo "\${IPname} vs \${INPUTname}" >> $output ; Rscript ${PIPELINE_ROOT}/tools/ENCODEqc/IPstrength.R ${ipstrength_vars.mapped}/\$IP \$IPname ${ipstrength_vars.mapped}/\$INPUT \$INPUTname $subdir\${IPname}.vs.\${INPUTname}_ipstrength ${ipstrength_vars.bsgenome}; if [ \$? -ne 0 ]; then rm $output; fi; find . -maxdepth 1 -name "$subdir\${IPname}.vs.\${INPUTname}_ipstrength*" -exec sh -c 'mv "\$1" "$output.dir/\${1#./$subdir}"' _ {} \\;; else echo "\${IPname} skipped because no input" >> $output ; fi; fi; done ""","ipstrength" |
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | exec """ ${TOOL_ENV} && ${PREAMBLE} && touch $output; if [ ! -e ${macs2_vars.targets} ]; then echo "Targets file ${macs2_vars.targets} doesn't exist" >> $output && exit 0; fi; BAM=\$(basename $input) && extension="\${BAM#*.}" && BAM="\${BAM%%.*}" && grep "^\$BAM" ${macs2_vars.targets} | while read -r TARGET; do IP=\$( echo $TARGET | tr '\t' ' ' | cut -f1 -d" ").\$extension && IPname=\$( echo $TARGET | tr '\t' ' ' | cut -f2 -d" ") && INPUT=\$( echo $TARGET | tr '\t' ' ' | cut -f3 -d" ").\$extension && INPUTname=\$(echo $TARGET | tr '\t' ' ' | cut -f4 -d" "); if [ "\$BAM" != "\$INPUT" ]; then if [ "\$INPUT" != "none.\$extension" ]; then NAMEoutput="\${IPname}.vs.\${INPUTname}" && INPUTflag="-c ${macs2_vars.mapped}/\$INPUT"; else NAMEoutput="\${IPname}" && INPUTflag=""; fi && echo "\$NAMEoutput" >> $output && macs2 callpeak -t ${macs2_vars.mapped}/\$IP \$INPUTflag -n $subdir\${NAMEoutput}_macs2 $MACS2_FLAGS ; if [ \$? -ne 0 ]; then rm $output; fi ; find . -maxdepth 1 -name "$subdir\${NAMEoutput}_macs2*" -exec sh -c 'mv "\$1" "$output.dir/\${1#./$subdir}"' _ {} \\;; fi; done ""","macs2" |
24 25 26 27 28 29 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/BlackList_Filter/make_greylist.R $MAKE_GREYLIST_FLAGS; ""","make_greylist" |
15 16 17 18 19 20 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/ENCODEqc/PBC.R $input && mv ${input.prefix}_PBC.csv $output.dir ""","pbc" |
26 27 28 29 30 31 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/Peak_Annotation/Peak_Annotation.R $PEAK_ANNOTATION_FLAGS; ""","peak_annotation" |
23 24 25 26 27 28 29 30 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/ENCODEqc/phantompeak.R $input \$(basename $input.prefix) $PHANTOMPEAK_FLAGS && mv \$(basename $input.prefix)_phantompeak.* $output.dir ""","phantompeak" |
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | exec """ ${PREAMBLE} && cp ${PIPELINE_ROOT}/tools/reports/shiny_chipseq_reporting_tool/ChIP.shinyrep.helpers.R ${REPORTS} && cp ${PIPELINE_ROOT}/tools/reports/shiny_chipseq_reporting_tool/styles.css ${REPORTS} && if [ -e "${REPORTS}/ChIPreport.Rmd" ]; then echo 'ChIPreport.Rmd already exists. Older copy will be kept and not overwritten'; else cp ${PIPELINE_ROOT}/tools/reports/shiny_chipseq_reporting_tool/ChIPreport.Rmd ${REPORTS}; fi && PROJECT=\$(basename ${shinyReports_vars.project}) && sed -i "2,2s/SHINYREPS_PROJECT/\${PROJECT}/" ${REPORTS}/ChIPreport.Rmd && echo "SHINYREPS_PROJECT=${shinyReports_vars.project}" > $output && echo "SHINYREPS_LOG=${shinyReports_vars.log}" >> $output && echo "SHINYREPS_QC=${shinyReports_vars.qc}" >> $output && echo "SHINYREPS_RES=${shinyReports_vars.res}" >> $output && echo "SHINYREPS_TARGET=${shinyReports_vars.target}" >> $output && echo "SHINYREPS_PAIRED=${shinyReports_vars.paired}" >> $output && echo "SHINYREPS_RUN_CUTADAPT=${shinyReports_vars.run_cutadapt}" >> $output && echo "SHINYREPS_RUN_PEAK_ANNOTATION=${shinyReports_vars.run_peakanno}" >> $output && echo "SHINYREPS_RUN_DIFFBIND=${shinyReports_vars.run_diffbind}" >> $output && echo "SHINYREPS_RUN_ENRICHMENT=${shinyReports_vars.run_enrich}" >> $output && echo "SHINYREPS_CUTADAPT_STATS=${shinyReports_vars.cutadapt_stats}" >> $output && echo "SHINYREPS_SAMTOOLSCOV_OUT=${shinyReports_vars.samtoolscov_out}" >> $output && echo "SHINYREPS_BOWTIE_LOG=${shinyReports_vars.bowtie_log}" >> $output && echo "SHINYREPS_BAMINDEX_LOG=${shinyReports_vars.bamindex_log}" >> $output && echo "SHINYREPS_MARKDUPS_LOG=${shinyReports_vars.markdups_log}" >> $output && echo "SHINYREPS_EXTEND_LOG=${shinyReports_vars.extend_log}" >> $output && echo "SHINYREPS_FASTQC=${shinyReports_vars.fastqc}" >> $output && echo "SHINYREPS_FASTQC_LOG=${shinyReports_vars.fastqc_log}" >> $output && echo "SHINYREPS_FASTQC_SUMMARIZED=${shinyReports_vars.fastqc_summarized}" >> $output && echo "SHINYREPS_FASTQSCREEN_OUT=${shinyReports_vars.fastqscreen_out}" >> $output && echo "SHINYREPS_FASTQSCREEN_LOG=${shinyReports_vars.fastqscreen_log}" >> $output && echo "SHINYREPS_FASTQSCREEN_PERC=${shinyReports_vars.fastqscreen_perc}" >> $output && echo "SHINYREPS_INSERTSIZE=${shinyReports_vars.insertsize}" >> $output && echo "SHINYREPS_IPSTRENGTH=${shinyReports_vars.ipstrength}" >> $output && echo "SHINYREPS_IPSTRENGTH_LOG=${shinyReports_vars.ipstrength_log}" >> $output && echo "SHINYREPS_PBC=${shinyReports_vars.pbc}" >> $output && echo "SHINYREPS_PHANTOMPEAK=${shinyReports_vars.phantompeak}" >> $output && echo "SHINYREPS_PHANTOM_LOG=${shinyReports_vars.phantom_log}" >> $output && echo "SHINYREPS_MACS2=${shinyReports_vars.macs2}" >> $output && echo "SHINYREPS_MACS2_LOG=${shinyReports_vars.macs2_log}" >> $output && echo "SHINYREPS_BLACKLIST_FILTER=${shinyReports_vars.blacklist_filter}" >> $output && echo "SHINYREPS_UPSETPLOT=${shinyReports_vars.upset}" >> $output && echo "SHINYREPS_PREFIX=${shinyReports_vars.prefix}" >> $output && echo "SHINYREPS_PLOTS_COLUMN=${shinyReports_vars.plots_column}" >> $output && echo "SHINYREPS_PEAK_ANNOTATION=${shinyReports_vars.peak_annotation}" >> $output && echo "SHINYREPS_DB=${shinyReports_vars.db}" >> $output && echo "SHINYREPS_GREAT=${shinyReports_vars.great}" >> $output && echo "SHINYREPS_DIFFBIND=${shinyReports_vars.diffbind}" >> $output && echo "SHINYREPS_TRACKHUB_DONE=${shinyReports_vars.trackhub_done}" >> $output && echo "SHINYREPS_TOOL_VERSIONS=${shinyReports_vars.tool_versions}" >> $output ""","shinyReports" |
25 26 27 28 29 30 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/upsetPlot/upsetPlot.R $UPSET_FLAGS; ""","upsetPlot" |
20 21 22 23 24 25 | exec """ ${TOOL_ENV} && ${PREAMBLE} && bamCoverage $BAMCOVERAGE_FLAGS --bam $input -o ${output}; ""","bamCoverage" |
15 16 17 18 19 20 | exec """ ${TOOL_ENV} && ${PREAMBLE} && samtools index $input ""","BAMindexer" |
16 17 18 19 20 21 | exec """ ${TOOL_ENV} && ${PREAMBLE} && bamqc $BAMQC_FLAGS -o $output.dir $input ""","BamQC" |
50 51 52 53 54 55 56 57 58 | exec """ ${TOOL_ENV} && ${PREAMBLE} && cutadapt $CUTADAPT_FLAGS $CUTADAPT_FLAGS_PAIRED $CUTADAPT_FLAGS_TOOLONG --too-short-output=\${TMP}/${SAMPLENAME_BASE}.cutadapt_discarded.fastq.gz --output=\${TMP}/${SAMPLENAME_BASE}.cutadapt.fastq.gz $input1 $CUTADAPT_INPUT_SECOND 1> ${CUTADAPT_STATSDIR}/${SAMPLENAME_BASE_PRUNED}.cutadapt.log && mv -t ${CUTADAPT_DISCARDED_DIR}/ \${TMP}/${SAMPLENAME_BASE_PRUNED}*cutadapt_discarded*.fastq.gz && mv -t $output.dir \${TMP}/${SAMPLENAME_BASE_PRUNED}*cutadapt.fastq.gz ""","Cutadapt" |
17 18 19 20 21 22 23 24 25 | exec """ ${TOOL_ENV} && ${PREAMBLE} && CHRSIZES="\${TMP}/\$(basename ${input.prefix}).extend.chrsizes" && samtools idxstats ${input} | cut -f1-2 > "\${CHRSIZES}" && bedtools bamtobed -split -i $input | bedtools slop -g "\${CHRSIZES}" -l 0 -r ${extend_vars.fraglen} -s | bedtools bedtobam -ubam -g "\${CHRSIZES}" | samtools sort $SAMTOOLS_SORT_FLAGS -T \${TMP}/\$(basename $output.prefix) - > $output && samtools index $output ""","extend" |
18 19 20 21 22 23 | exec """ ${TOOL_ENV} && ${PREAMBLE} && fastqc --extract $FASTQC_FLAGS -o $output.dir $inputs ""","FastQC" |
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | exec """ ${TOOL_ENV} && ${PREAMBLE} && if [ ! -e "$output.prefix" ]; then mkdir $output.prefix; fi && fastqreference=${FastqScreen_vars.conf}; references=(\${fastqreference//,/ }); for i in "\${!references[@]}"; do reference=(\${references[i]//::/ }); echo -e "DATABASE\t\${reference[0]}\t\${reference[1]}" >> $output.prefix/fastqscreen.conf; done; fastq_screen $FASTQSCREEN_FLAGS --conf $output.prefix/fastqscreen.conf --outdir $output.prefix $inputs; touch $outputs ""","FastqScreen" |
14 15 16 17 18 19 20 21 | exec """ ${TOOL_ENV} && ${PREAMBLE} && chroms=\$(cut -f1 ${FilterChr_vars.file}) && samtools view -@ ${FilterChr_vars.threads} -b $input \$chroms > $output && samtools index $output ""","FilterChr" |
20 21 22 23 24 25 | exec """ ${TOOL_ENV} && ${PREAMBLE} && java ${InsertSize_vars.java_flags} -jar \${PICARD} CollectInsertSizeMetrics $INSERTSIZE_FLAGS INPUT=$input OUTPUT=$output HISTOGRAM_FILE=${output.prefix}_hist.pdf ""","InsertSize" |
19 20 21 22 23 24 | exec """ ${TOOL_ENV} && ${PREAMBLE} && java ${MarkDups_vars.java_flags} -jar \${PICARD} MarkDuplicates $MARKDUPS_FLAGS INPUT=$input OUTPUT=$output METRICS_FILE=${input.prefix}_dupmarked_dupmetrics.tsv ""","MarkDups" |
16 17 18 19 20 21 | exec """ ${TOOL_ENV} && ${PREAMBLE} && multiqc $ESSENTIAL_PROJECT $MultiQC_FLAGS -o $output.dir ""","MULTIQC" |
19 20 21 22 23 24 | exec """ ${TOOL_ENV} && ${PREAMBLE} && java ${RmDups_vars.java_flags} -jar \${PICARD} MarkDuplicates $RMDUPS_FLAGS INPUT=$input OUTPUT=$output METRICS_FILE=${input.prefix}_duprm_dupmetrics.tsv TMP_DIR=\${TMP} ""","RmDups" |
16 17 18 19 20 21 | exec """ ${TOOL_ENV} && ${PREAMBLE} && samtools coverage $SAMTOOLSCOV_FLAGS -o $output $input ""","samtoolscov" |
26 27 28 29 30 31 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/trackhub/Configure_Trackhub.R $TRACKHUB_FLAGS ""","trackhub_config" |
17 18 19 20 21 22 | exec """ ${TOOL_ENV} && ${PREAMBLE} && Rscript ${PIPELINE_ROOT}/tools/trackhub/Make_Trackhub.R $TRACKHUB_FLAGS ""","trackhub" |
Support
- Future updates
Related Workflows





