Skip to content

priyank5548/PDF-Malware-Analysis-Toolkit

Repository files navigation

A practical static malware analysis toolkit for dissecting malicious PDF files using object, stream, and JavaScript level inspection.

📄 PDF Malware Analysis Toolkit

Python Static Analysis PDF Security Status License

A Python-based PDF static malware analysis toolkit that inspects the internal structure of PDF files to detect malicious behavior such as embedded JavaScript, obfuscated payloads, suspicious streams, and embedded files.

This toolkit replicates the methodology of professional tools like pdfid, pdf-parser, peepdf, qpdf, and strings in a single automated solution.


🎯 Project Goal

To design a toolkit capable of:

  • Understanding PDF internal structure
  • Enumerating objects and streams
  • Detecting malicious JavaScript
  • Extracting Indicators of Compromise (IOC)
  • Detecting embedded malicious payloads
  • Performing automated risk assessment
  • Generating structured malware analysis reports

🧠 Toolkit Workflow

Input PDF → Metadata Extraction → Object Enumeration → Stream Extraction & Decoding → Keyword & JavaScript Detection → IOC Extraction → Embedded File Detection → Risk Scoring → Report Generation


🧩 Features

  • Metadata analysis (Author, Creator, Dates)
  • Object enumeration from raw PDF structure
  • Stream detection and zlib decompression
  • JavaScript malware detection (eval, unescape, base64)
  • IOC extraction (URLs, IPs, shellcode patterns)
  • Embedded file extraction
  • Automated severity scoring engine
  • Detailed malware analysis report

🛠️ Techniques Inspired By

Toolkit Capability Industry Tool Equivalent
Keyword detection pdfid
Object parsing pdf-parser
Stream decoding qpdf
JavaScript analysis peepdf
IOC hunting strings

📂 Project Structure

pdf_malware_toolkit/

  • main.py
  • core.py
  • report_generator.py
  • make-pdf-javascript.py
  • make-pdf-helloworld.py
  • mPDF.py
  • test.pdf
  • reports/
  • diagrams/

▶️ Usage

Run the toolkit:

python main.py

Provide a PDF file path or folder path when prompted.


📝 Example Output

Total Objects: 5
Stream Objects: ['1']

/JS : 1
/JavaScript : 1
/OpenAction : 1

Malware Findings:

  • eval() usage
  • unescape() usage
  • URL found

CRITICAL | Score: 90


📑 Report Contains

  • Metadata details
  • Object & stream enumeration
  • Keyword scan results
  • Malware findings & IOCs
  • Embedded file objects
  • Risk severity score

🎓 Learning Outcomes

This project demonstrates:

  • Static malware analysis
  • Understanding of PDF internals
  • JavaScript-based attack detection
  • Threat hunting using IOC patterns
  • Automated forensic reporting

⚠️ Disclaimer

This project is intended for educational and research purposes only.


👨‍💻 Author

PDF Malware Analysis Toolkit Project

About

A Python-based static analysis tool that inspects PDF internal structure to detect malicious JavaScript, obfuscated streams, embedded payloads, and indicators of compromise using object & stream level parsing inspired by pdfid, pdf-parser, peepdf, and qpdf methodologies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages