Skip to content

axp-knickei/omics-volcano-plotter

Repository files navigation

Volcano Plot Generation Tool

Project Overview

This repository provides a Python-based solution for generating high-quality volcano plots from differential expression data. It identifies genes that are significantly differentially expressed between two conditions by combining fold-change and statistical significance (Adjusted P-value). Volcano plots are a widely used visualization in bioinformatics to identify genes that are significantly differentially expressed between two conditions, combining fold-change and statistical significance.

Features

Version 2 (26 November 2025)

  • CLI Support: Run the script from the command line with custom arguments.
  • Dynamic Thresholds: Set your own Adjusted P-value and Log2 Fold Change cutoffs.
  • Automated Labeling: Automatically labels the top N most significant genes.
  • Publication Ready: Exports high-resolution PNGs (300 DPI).

Version 1 (August 2025)

  • Automated Plot Generation: Generates publication-ready volcano plots from structured input data.
  • Customizable Gene Labeling: Supports highlighting and labeling specific genes based on user-defined criteria for fold-change and statistical significance (adjusted p-value).
  • Dependency Management: Utilizes a requirements.txt file for straightforward environment setup.
  • Jupyter Notebook Integration: Includes an accompanying Jupyter notebook (high_quality_volcano_plots.ipynb) for interactive data exploration and plot customization.

Installation

To set up the project environment, ensure you have Python 3.8+ installed. It is recommended to use a virtual environment.

  1. Clone the repository:

    git clone https://github.com/YOUR_USERNAME/volcano_plot_py.git
    cd volcano_plot_py

    (Note: Replace YOUR_USERNAME with your actual GitHub username and adjust the repository name if different.)

  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

Usage

The primary script is make_volcano_plot.py. You can run it with default settings or specify your own parameters.

  1. Prepare your data: Ensure your differential expression data is in a CSV format named differential_expression.csv. You can use the provided data_template.csv as a starting point. The file should contain the following columns:

    Column Header Description Required?
    "" (Index) Unique gene identifier (e.g., Ensembl ID). First column. Yes
    baseMean Mean expression level. Used to determine the size of the points. Yes
    log2FoldChange Log2 fold change between conditions. Mapped to the X-axis. Yes
    padj Adjusted p-value. Transformed to -log10(padj) for the Y-axis. Yes
    symbol Gene symbol or name. Used for labeling and identification. Yes
    lfcSE Log fold change standard error. Optional
    stat Wald statistic. Optional
    pvalue Raw p-value. Optional
  2. Run the script:

    Default Run

    Assumes input is differential_expression.csv and uses defaults: Adjusted P-value < 0.05, LogFC > 1.

    python make_volcano_plot.py

    Custome Run

    Example: Using a specific file, stricter thresholds, and labeling the top 20 genes

    python make_volcano_plot.py --input my_results.csv --output my_plot.png --pval 0.01 --lfc 2.0 --top_n 20    

    Arguments for Custome Run 😘

Flag Description Default
-i, --input Path to input CSV file differential_expression.csv
-o, --output Path to save the PNG image volcano.png
--pval Adjusted P-value cutoff for significance 0.05
--lfc Log2 Fold Change cutoff (absolute) 1.0
--top_n Number of top significant genes to label 10

Another example for custome run — favorite features 😃:

python make_volcano_plot.py --lfc 2.0 --pval 0.01 --output strict_volcano.png
  • python make_volcano_plot.py: Runs the script. Because no input file is specified with -i, it will look for the default file differential_expression.csv.

  • --lfc 2.0: Sets the Log2 Fold Change cutoff to 2.0.

    • Comment: This is stricter than the default (1.0). A gene must have at least a 4-fold change ($2^2 = 4$) in expression (either up or down) to be considered biologically significant.
  • --pval 0.01: Sets the Adjusted P-value cutoff to 0.01.

    • Comment: This is stricter than the default (0.05). It means there is a only a 1% estimated False Discovery Rate (FDR) allowed for the genes you highlight.
  • --output strict_volcano.png: Saves the resulting image as strict_volcano.png.

    • Comment: This is useful so you don't overwrite your previous volcano.png.

    Comparison with default run

    Compared to the default run, this plot will show fewer significant genes (fewer red/blue dots) because the criteria to be colored are much harder to meet. This is useful when you have too many "significant" genes and want to narrow focus to only the strongest candidates.

  1. Jupyter Notebook: For interactive analysis and further customization, open the provided Jupyter notebook:
    jupyter notebook high_quality_volcano_plots.ipynb

Output

The script make_volcano_plot.py produces a PNG image file named volcano.png which visually represents the differential expression analysis.

Volcano Plot

Attribution

This project is inspired by and derived from the excellent work of Mark (mousepixels). Special thanks for his valuable contributions to the bioinformatics community.

Dependencies

The project relies on the following Python libraries:

  • pandas
  • seaborn
  • matplotlib
  • numpy
  • adjustText

These are listed in requirements.txt and will be installed during the setup process.

About

Visualize omics differential expression with Python. Generate publication-quality volcano plots featuring custom gene labeling, color-coding, and shape mapping for clear biological insights. Ideal for RNA-seq and other high-throughput data analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors