Analyzing malicious PDF using Pdfid, Pdf-parser tools
In this article, I'll show you how to analyze a malicious pdf using Strings, exiftool, pdfid, pdf-parser tools.
This is our malicious pdf, we'll get IOCs from this pdf file
(i) Using strings tool:
Go to the command prompt then type strings -a sample.pdf (sample.pdf is malicious pdf)
-a --> Used to print all strings within the pdf file.
you can see Javascript and OpenAction keywords and these are IOCs.
(ii) xorsearch:
Using xorsearch to find any encrypted strings within the pdf file.
type xorsearch sample.pdf http - using this command to find encrypted URL
xorsearch -p sample.pdf - Using this command to find an embedded executable
unfortunately, we can't get any helpful information.
(iii) Using exiftool:
exiftool is used to extract metadata from pdf files.
type exiftool sample.pdf
you can see the above image, It shows the pdf file version is 1.3 and has read and write permissions only.
(iv) Uisng PDFiD tool:
PDFiD is a Python tool, it is used to analyze and sanitize PDF files.
type pdfid.py sample.pdf
you can see the above image, openaction used by malware in one object and javascript keyword used in three objects. most probably openaction keyword tries to execute the javascript code within the pdf file.
(v) Using Pdf-parser tool:
PDF Parser is used to extract data from PDF documents.
type pdf-parser.py --search openaction sample.pdf
pdf reader will trigger the openaction to execute the javascript when the sample.pdf opens.
领英推荐
so we need to find how many objects use the javascript keyword
type pdf-parser.py --search javascript sample.pdf
there are three objects use javascript (object 1, object 7, object 12)
object 1 - this object will use openaction to execute the javascript code and it references object 7.
object 7 - this object references object 10
object 12 - this object references object 13
so we need to find what can object 10 and object 13 do,
type pdf-parser.py --object 10 sample.pdf
object 10 - this object references object 12
object 12 - this object references object 13
type pdf-parser.py --object 13 sample.pdf
object 13 contains the javascript code, it used zlibcompression to compress the javascript code.
if the pdf opens, the pdf reader will use the filter to decompress the javascript and then execute.
we can see the decrypted javascript code using this command,
pdf-parser.py --object 13 -f -w sample.pdf
-f - It used to decode data in the object
-w - It used to display raw data from the object
the above image shows the decompressed javascript code, we need to dump this code into another file. so,
type pdf-parser.py --object 13 -f -w -d dumped.js sample.pdf
-d - It used to dump the code into another file
I've saved this decoded javascript file as dumped.js for further analysis
open the dumped.js file in Notepad++ or any other text editor, it will arrange the code into the correct format.
this javascript code uses the function, for and while loop, and more. I'll show you how to analyze the malicious javascript code in my upcoming posts.
Conclusion:
the first stage is to execute the embedded malicious code within the PDF file, and the second stage is embedded malicious code downloads the additional payloads/malware from the internet.
In this case, we've found the malicious javascript file in object 13.
Student at The Johns Hopkins University - Carey Business School
1 年Thanks, but it would have been helpful if you provided links to the different programs you used and Sample.pdf.
Mohanraj A Thank you for Sharing! ??