This cheat sheet outlines tips and tools for reverse-engineering malicious documents, such as Microsoft Office (DOC, XLS, PPT) and Adobe Acrobat (PDF) files.
- Extract suspicious code segments from the file.
- If relevant, disassemble and/or debug shellcode.
- Understand next steps in the infection chain.
Microsoft Office Binary File Format Notes
Structured Storage (OLE SS) defines a file system inside the binary Microsoft Office file.
Data can be “storage” (folder) and “steam” (file).
Excel stores data inside the “workbook” stream.
PowerPoint stores data inside the “PowerPoint Document” stream.
Word stores data inside various streams.
Tools for Analyzing Microsoft Office Files
OfficeMalScanner locates shellcode and VBA macros from MS Office (DOC, XLS, and PPT) files.
DisView disassembles bytes at a given offset of an MS Office file. (Part of OfficeMalScanner)
MalHost-Setup extracts shellcode from a given offset in an MS Office file and embeds it an EXE file for further analysis. (Part of OfficeMalScanner)
Offvis shows raw contents and structure of an MS Office file, and identifies some common exploits.
BIFF-Workbench shows raw contents and structure of an XLS file and supports editing and searching.
OfficeCat scans MS Office files for embedded exploits that target several known vulnerabilities.
Useful MS Office Analysis Commands
|OfficeMalScanner file.doc scan brute||Locate shellcode, OLE data, PE files in file.doc|
|OfficeMalScanner file.doc info||Locate VB macro code in file.doc (no XML files)|
|OfficeMalScanner file.docx inflate||Decompress file.docx to locate VB code (XML files)|
|DisView file.doc 0x4500||Disassemble shellcode at 0x4500 in file.doc|
|MalHost-Setup file.doc out.exe 0x4500||Extract shellcode from file.doc’s offset 0x4500 and create it as out.exe|
Adobe PDF File Format Notes
A PDF File is comprised of header, objects, cross-reference table (to locate objects), and trailer.
“/OpenAction” and “/AA” (Additional Action) specifies the script or action to run automatically.
“/Names”, “/AcroForm”, “/Action” can also specify and launch scripts or actions.
“/GoTo*” changes the view to a specified destination within the PDF or in another PDF file.
“/Launch” launches a program or opens a document.
“/URI” accesses a resource by its URL.
“/SubmitForm” and “/GoToR” can send data to URL.
“/RichMedia” can be used to embed Flash in PDF.
“/ObjStm” can hide objects inside an Object Stream.
Tools for Analyzing Adobe PDF Files
PDFiD identifies PDFs that contain strings associated with scripts and actions. (Part of Python PDF Tools)
PDF-parser identifies key elements of the PDF file without rendering it (Part of Python PDF Tools)
Origami is a Ruby framework for parsing, analyzing, modifying, and creating PDF files.
Pdftk tweaks PDFs and uncompresses page streams.
Useful PDF Analysis Commands
|pdfid.py file.pdf||Locate script and action-related strings in file.pdf|
|pdf-parser.py file.pdf||Show file.pdf’s structure to identify suspect elements|
|pdfscan.rb file.pdf||Examine and display file.pdf’s structure (Usage)|
|pdftk file.pdf output out.pdf uncompress||Uncompress page streams in file.pdf and save the result in out.pdf|
Additional Malicious File Analysis Tools
McAfee FileInsight integrates a hex editor, calculator, disassembler, decoders, scripting support, etc.
ExeFilter can filter scripts from Office and PDF files.
VirusTotal can scan files with multiple anti-virus tools to identify some malicious documents.
Found this cheat sheet useful? Tweet it!
This cheat sheet is distributed according to the Creative Commons v3 "Attribution" License. File version 1.5.
Take a look at my other security cheat sheets.
About the Author: Lenny Zeltser leads the security consulting practice at Savvis. His team provides security assessments, design, and operational assistance for business-critical IT infrastructure. Lenny also teaches malware analysis at SANS Institute, explores security topics at conferences and in articles, and volunteers as an incident handler at the Internet Storm Center.