PDF Malware Analysis Framework

A comprehensive educational framework for analyzing PDF files for malicious content

Features • Installation • Usage • API Integration • Documentation

⚠️ IMPORTANT DISCLAIMER

This is an EDUCATIONAL PROJECT ONLY

This framework is designed for:

✅ Security researchers learning about PDF malware analysis
✅ Educational purposes in academic environments
✅ Training and skill development
✅ Understanding PDF structure and malware techniques

NOT INTENDED FOR:

❌ Production use without proper security review
❌ Analyzing malware in production environments
❌ Bypassing security controls
❌ Any illegal or malicious activities

⚠️ API KEYS NOTE: This project contains placeholder API keys (shown as <YOUR_API_KEY_HERE>, <API_KEY_PLACEHOLDER>, etc.). These are NOT valid API keys. You must:

Register for your own API keys from respective services
Replace all placeholder text with your actual API keys
Never commit real API keys to version control
Use environment variables for production deployments

✨ Features

Core Analysis Features

Static Analysis: Extract metadata, hashes, and basic file properties
JavaScript Detection: Identify and deobfuscate malicious JavaScript
Stream Analysis: Analyze encoded and compressed streams
Embedded File Detection: Find and analyze embedded files
Structure Analysis: Detect structural anomalies and suspicious objects
Risk Scoring: Intelligent risk assessment based on multiple factors

Advanced Capabilities

Multiple Output Formats: JSON, HTML, and visual reports
Batch Processing: Analyze multiple files simultaneously
Directory Monitoring: Watch folders for new PDFs
Web Interface: User-friendly web UI for analysis
Extensible Architecture: Easy to add new analyzers

API Integrations (Placeholder Only)

VirusTotal - File hash lookup (requires API key)
URLScan.io - URL analysis (requires API key)
Hybrid Analysis - Sandbox analysis (requires API key)

🏗️ Project Structure

pdf-malware-analyzer/
├── src/
│ ├── core/ # Core components
│ │ ├── base_analyzer.py
│ │ ├── pdf_parser.py
│ │ └── risk_scorer.py
│ ├── analyzers/ # Analysis modules
│ │ ├── basic_analyzer.py
│ │ ├── metadata_analyzer.py
│ │ ├── javascript_analyzer.py
│ │ ├── stream_analyzer.py
│ │ ├── structure_analyzer.py
│ │ └── embedded_file_analyzer.py
│ ├── deobfuscators/ # Code deobfuscation
│ │ └── js_deobfuscator.py
│ ├── threat_intel/ # API integrations (placeholders)
│ │ ├── virustotal.py
│ │ ├── urlscan.py
│ │ └── hybrid_analysis.py
│ ├── reporters/ # Output generation
│ │ ├── json_reporter.py
│ │ └── html_reporter.py
│ └── utils/ # Utilities
│ ├── file_utils.py
│ ├── logger.py
│ └── config_loader.py
├── tests/ # Test suite
│ └── test_samples/ # Test PDF generators
├── scripts/ # Utility scripts
│ ├── batch_analyze.py
│ └── monitor_directory.py
├── web_interface/ # Flask web application
│ ├── app.py
│ └── templates/
├── config.yaml # Configuration file
├── requirements.txt # Python dependencies
└── README.md # This file

🔧 Installation

Prerequisites

Python 3.8 or higher
pip package manager
Git (optional)

Step 1: Clone the Repository

git clone https://github.com/yourusername/pdf-malware-analyzer.git
cd pdf-malware-analyzer

Step 2: Create Virtual Environment (Recommended)

On Windows:

python -m venv venv
venv\Scripts\activate

On Linux/Mac:

python3 -m venv venv
source venv/bin/activate

Step 3: Install Dependencies

pip install -r requirements.txt

Step 4: Install the Package

pip install -e .

Step 5: Configure API Keys

Edit config.yaml and replace all placeholder API keys with your actual keys:

# In config.yaml - REPLACE ALL PLACEHOLDERS
threat_intel:
  virus_total:
    enabled: false  # Set to true to enable
    api_key: "YOUR_ACTUAL_VIRUSTOTAL_API_KEY"  # Replace placeholder
    
  urlscan:
    enabled: false  # Set to true to enable
    api_key: "YOUR_ACTUAL_URLSCAN_API_KEY"  # Replace placeholder
    
  hybrid_analysis:
    enabled: false  # Set to true to enable
    api_key: "YOUR_ACTUAL_HYBRID_ANALYSIS_API_KEY"  # Replace placeholder
    secret: "YOUR_ACTUAL_HYBRID_ANALYSIS_SECRET"  # Replace placeholder

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
pdf-malware-analyzer		pdf-malware-analyzer
MIT License		MIT License
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Malware Analysis Framework

⚠️ IMPORTANT DISCLAIMER

📋 Table of Contents

✨ Features

Core Analysis Features

Advanced Capabilities

API Integrations (Placeholder Only)

🏗️ Project Structure

🔧 Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Create Virtual Environment (Recommended)

Step 3: Install Dependencies

Step 4: Install the Package

Step 5: Configure API Keys

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Malware Analysis Framework

⚠️ IMPORTANT DISCLAIMER

📋 Table of Contents

✨ Features

Core Analysis Features

Advanced Capabilities

API Integrations (Placeholder Only)

🏗️ Project Structure

🔧 Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Create Virtual Environment (Recommended)

Step 3: Install Dependencies

Step 4: Install the Package

Step 5: Configure API Keys

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages