Skip to content

Ghost380/PDF-malware-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

PDF Malware Analysis Framework

PDF Malware Analysis Framework Python Version License Status

A comprehensive educational framework for analyzing PDF files for malicious content

FeaturesInstallationUsageAPI IntegrationDocumentation


⚠️ IMPORTANT DISCLAIMER

This is an EDUCATIONAL PROJECT ONLY

This framework is designed for:

  • ✅ Security researchers learning about PDF malware analysis
  • ✅ Educational purposes in academic environments
  • ✅ Training and skill development
  • ✅ Understanding PDF structure and malware techniques

NOT INTENDED FOR:

  • ❌ Production use without proper security review
  • ❌ Analyzing malware in production environments
  • ❌ Bypassing security controls
  • ❌ Any illegal or malicious activities

⚠️ API KEYS NOTE: This project contains placeholder API keys (shown as <YOUR_API_KEY_HERE>, <API_KEY_PLACEHOLDER>, etc.). These are NOT valid API keys. You must:

  1. Register for your own API keys from respective services
  2. Replace all placeholder text with your actual API keys
  3. Never commit real API keys to version control
  4. Use environment variables for production deployments

📋 Table of Contents


✨ Features

Core Analysis Features

  • Static Analysis: Extract metadata, hashes, and basic file properties
  • JavaScript Detection: Identify and deobfuscate malicious JavaScript
  • Stream Analysis: Analyze encoded and compressed streams
  • Embedded File Detection: Find and analyze embedded files
  • Structure Analysis: Detect structural anomalies and suspicious objects
  • Risk Scoring: Intelligent risk assessment based on multiple factors

Advanced Capabilities

  • Multiple Output Formats: JSON, HTML, and visual reports
  • Batch Processing: Analyze multiple files simultaneously
  • Directory Monitoring: Watch folders for new PDFs
  • Web Interface: User-friendly web UI for analysis
  • Extensible Architecture: Easy to add new analyzers

API Integrations (Placeholder Only)

  • VirusTotal - File hash lookup (requires API key)
  • URLScan.io - URL analysis (requires API key)
  • Hybrid Analysis - Sandbox analysis (requires API key)

🏗️ Project Structure

pdf-malware-analyzer/
├── src/
│ ├── core/ # Core components
│ │ ├── base_analyzer.py
│ │ ├── pdf_parser.py
│ │ └── risk_scorer.py
│ ├── analyzers/ # Analysis modules
│ │ ├── basic_analyzer.py
│ │ ├── metadata_analyzer.py
│ │ ├── javascript_analyzer.py
│ │ ├── stream_analyzer.py
│ │ ├── structure_analyzer.py
│ │ └── embedded_file_analyzer.py
│ ├── deobfuscators/ # Code deobfuscation
│ │ └── js_deobfuscator.py
│ ├── threat_intel/ # API integrations (placeholders)
│ │ ├── virustotal.py
│ │ ├── urlscan.py
│ │ └── hybrid_analysis.py
│ ├── reporters/ # Output generation
│ │ ├── json_reporter.py
│ │ └── html_reporter.py
│ └── utils/ # Utilities
│ ├── file_utils.py
│ ├── logger.py
│ └── config_loader.py
├── tests/ # Test suite
│ └── test_samples/ # Test PDF generators
├── scripts/ # Utility scripts
│ ├── batch_analyze.py
│ └── monitor_directory.py
├── web_interface/ # Flask web application
│ ├── app.py
│ └── templates/
├── config.yaml # Configuration file
├── requirements.txt # Python dependencies
└── README.md # This file

🔧 Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager
  • Git (optional)

Step 1: Clone the Repository

git clone https://github.com/yourusername/pdf-malware-analyzer.git
cd pdf-malware-analyzer

Step 2: Create Virtual Environment (Recommended)

On Windows:

python -m venv venv
venv\Scripts\activate

On Linux/Mac:

python3 -m venv venv
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Install the Package

pip install -e .

Step 5: Configure API Keys

Edit config.yaml and replace all placeholder API keys with your actual keys:

# In config.yaml - REPLACE ALL PLACEHOLDERS
threat_intel:
  virus_total:
    enabled: false  # Set to true to enable
    api_key: "YOUR_ACTUAL_VIRUSTOTAL_API_KEY"  # Replace placeholder
    
  urlscan:
    enabled: false  # Set to true to enable
    api_key: "YOUR_ACTUAL_URLSCAN_API_KEY"  # Replace placeholder
    
  hybrid_analysis:
    enabled: false  # Set to true to enable
    api_key: "YOUR_ACTUAL_HYBRID_ANALYSIS_API_KEY"  # Replace placeholder
    secret: "YOUR_ACTUAL_HYBRID_ANALYSIS_SECRET"  # Replace placeholder

About

PDF Malware Analysis Framework - Educational tool for analyzing PDF files for malicious content, JavaScript, embedded files, and structural anomalies. Perfect for security researchers and students learning about PDF malware analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors