A practical static malware analysis toolkit for dissecting malicious PDF files using object, stream, and JavaScript level inspection.
A Python-based PDF static malware analysis toolkit that inspects the internal structure of PDF files to detect malicious behavior such as embedded JavaScript, obfuscated payloads, suspicious streams, and embedded files.
This toolkit replicates the methodology of professional tools like pdfid, pdf-parser, peepdf, qpdf, and strings in a single automated solution.
To design a toolkit capable of:
- Understanding PDF internal structure
- Enumerating objects and streams
- Detecting malicious JavaScript
- Extracting Indicators of Compromise (IOC)
- Detecting embedded malicious payloads
- Performing automated risk assessment
- Generating structured malware analysis reports
Input PDF → Metadata Extraction → Object Enumeration → Stream Extraction & Decoding → Keyword & JavaScript Detection → IOC Extraction → Embedded File Detection → Risk Scoring → Report Generation
- Metadata analysis (Author, Creator, Dates)
- Object enumeration from raw PDF structure
- Stream detection and zlib decompression
- JavaScript malware detection (
eval,unescape, base64) - IOC extraction (URLs, IPs, shellcode patterns)
- Embedded file extraction
- Automated severity scoring engine
- Detailed malware analysis report
| Toolkit Capability | Industry Tool Equivalent |
|---|---|
| Keyword detection | pdfid |
| Object parsing | pdf-parser |
| Stream decoding | qpdf |
| JavaScript analysis | peepdf |
| IOC hunting | strings |
pdf_malware_toolkit/
- main.py
- core.py
- report_generator.py
- make-pdf-javascript.py
- make-pdf-helloworld.py
- mPDF.py
- test.pdf
- reports/
- diagrams/
Run the toolkit:
python main.py
Provide a PDF file path or folder path when prompted.
Total Objects: 5
Stream Objects: ['1']
/JS : 1
/JavaScript : 1
/OpenAction : 1
Malware Findings:
- eval() usage
- unescape() usage
- URL found
CRITICAL | Score: 90
- Metadata details
- Object & stream enumeration
- Keyword scan results
- Malware findings & IOCs
- Embedded file objects
- Risk severity score
This project demonstrates:
- Static malware analysis
- Understanding of PDF internals
- JavaScript-based attack detection
- Threat hunting using IOC patterns
- Automated forensic reporting
This project is intended for educational and research purposes only.
PDF Malware Analysis Toolkit Project