Fuzzy Hash Malware Matching

Fuzzy Hash Malware Matching

First it all, let us undersatdning the Fuzzy hash Malware Matching is a technique often used in malware analysis and threat detection to compare files or data for similarities. It is a part of approximate matching, where instead of seeking an exact match (like in traditional hash comparisons), the goal is to identify files or data that are similar to each other.

Key Aspects of Fuzzy Hash Malware Matching:

  1. Similarity-Based Analysis: Fuzzy matching computes a similarity score between two pieces of data, which helps in identifying variations of the same malware (e.g., polymorphic or metamorphic malware) or files with slight modifications.
  2. Tools & Techniques:

  • SSDEEP: A widely used tool that generates context-triggered piecewise hashes to identify files with partial overlaps or similarities.
  • TLSH (Trend Micro Locality Sensitive Hashing): Generates a hash signature that can identify structurally similar files.
  • sdhash: Focuses on identifying common sequences within data to assess similarity.

Okay, now let's walk through how the Threat actors are abusing fuzzy hash:

So the attacker abusing fuzzy hashing is based on similarity, as we expaline above, and can be exploited to bypass security controls by making malicious files appear similar to legitimate files. This can indeed allow attackers to evade detection mechanisms in an enterprise.

Below are some real-world scenarios where fuzzy hashing might be abused in this way:

  • Scenario of attack: An attacker embeds malicious code into a legitimate application, such as a widely-used utility or executable.
  • How Abuse: The malicious file retains significant portions of the legitimate file’s content (e.g., unchanged headers or benign sections of code) while inserting harmful payloads. A fuzzy hashing comparison might yield a high similarity score between the malicious and legitimate files.
  • And here the good scenario: digital forensics investigators use fuzzy hashing to identify related or suspicious files.
  • How Abuse: An attacker creates a malicious file that produces a fuzzy hash resembling a benign or commonly used system file or document. In investigations, this similarity could lead to either overlooking the malicious file or creating a false positive with legitimate files.

Now let's see how we can detect that by using something very important during investigation and lead us to create a watchlist or rule in EDR solutions to detect and avoid the defensive evasion technique by using "ImpHsash."

And the goal of this article is to give users in the Defense Team a better understanding of ImpHash by highlighting the following:

  • What is an ImpHash?
  • Why ImpHash is Useful for Defense Teams, Especially DFIR?
  • When to use ImpHash?

What is an ImpHash?

So, ImpHash (import hash) is a hashing method for PE executable files, such as EXEs and DLLs, that allows you to make fuzzy matches. It allows you to easily identify executables that are similar but not necessarily exactly the same.

and ImpHash focuses only on the executable's (which means it is a limitation) import table, which contains information about the external functions and libraries used by the executable. Executables will have the same import table if they are variations of each other or compiled using the same basic build infrastructure.?

As a result, binaries do not need to be exact matches to match based on ImpHash. This allows investigators to find malware samples that are likely to be the same at their core but have been altered slightly over time.

The ImpHash computation process involves the following steps:

  1. Extract the DLL names and function names from the import table
  2. Lowercase the DLL and function names
  3. Concatenate the DLL and function name as a comma-separated list while preserving the order in which they appear in the import table
  4. Calculate the MD5 hash of the resulting string

Now let us come to Why ImpHash is Useful for Defense Teams, Especially DFIR:

The primary problem that ImpHash solves is that it is easy for an attacker to change a bit in a file, which will result in a different content-based cryptographic hash. Yet, the file still has the same malicious behavior. To a security analyst or tool, it is hard to know if this file with a never-before-seen hash value is good or bad.

ImpHash allows you to find similar files, which may or may not be derivatives of the one you are focusing on. It’s important to note that this fuzzy hash is not based on logic behavior or code sequences in the executable. It’s entirely based on what the executable declares that it depends on (which could be a lie) and the order that it is declared.

When To Use ImpHash

ImpHash can be used in many ways. However, if one does not understand what ImpHash is and its pitfalls, then it may not be as effective as one hopes. The following are scenarios I believe ImpHash can best be leveraged:

  • Malware scanning and file upload restrictions
  • Hunting for malicious binaries
  • Tracking threat actor tooling

Malware Scanning and File Upload Restrictions

The main use case for ImpHash for Defense Team, or DFIR, is for malware analysis when the investigator can’t upload files to external analysis platforms, such as ReversingLabs or VirusTotal.?

An example of this is as follows:?

  • An alert is raised for a process executing from a suspicious location. An analyst performs a SHA256 hash lookup and finds nothing.?
  • Company policy dictates that files are not allowed to be uploaded to 3rd party services like VirusTotal, and no on-prem solutions are available.?
  • Instead, an ImpHash lookup is performed, and the service returns SHA256 hashes of other files that it has seen that have that ImpHash.?
  • Those hashes can then be looked up to see if they were good or bad.

Hunting For Malicious Binaries (exe,dll)

In this scenario, ImpHash can provide significant value is hunting malware. The primary advantage that ImpHash provides is that it can find matches in a network even when each host has a unique variation of the file. This can be illustrated in the following two scenarios.?

  1. An EDR alert goes off and malware is detected on a system within your environment. You grab its SHA256 hash, filename, and path and do a search across your systems to find if it’s spread. Nothing is found. You do an ImpHash search and get hits on 6 systems. Each instance of the malware has a unique file hash but is functionally identical. As a result, the SHA256 hash failed, but ImpHash was able to find all of the instances.
  2. Your threat intelligence feed indicates that there is a new campaign targeting the energy sector. You note down the SHA256 and ImpHash IOCs. A week later, you get several hits based on your ImpHash IOC but nothing for SHA256. This is because the threat actor had unique binaries made for each targeted organization; however, the tools that were used were the same ones mentioned in the original report.


Conclusion

In summary, ImpHashing is a method of fingerprinting a binary based on its import table. This has an advantage over traditional hashing as it allows for matching of binaries that are functionally or near functionally identical but are not exact copies of each other. ImpHashing can be useful for threat intelligence sharing, tracking threat actor tooling, and hunting down ever-changing malware. Using imphash is a smart and targeted approach to detect the specific evasion technique you described, especially for .exe and .dll files. While it has limitations, combining it with behavioral analysis, sandboxing, and other hash-based techniques (e.g., fuzzy hashing or cryptographic hashes) can significantly improve detection accuracy.

For more detailed of the ImpHashing process, check out Mandiant’s post.

Happe Detection and Hunting


Mohammed Alomari

Director Cybersecurity / Privacy / Risk Management / Corporate Governance, Audit and Compliance , EMBA, CISSP

3 个月

Proud of you Abo Zain .. Keep up the great work

Ghada Almoaiqel

Cyber Defense | DFIR | Reverse Engineering & Malware Analyst | GIAC Advisory Board Network+ | Security+ | GCIH | GREM

3 个月

Thank you for this Insightful post ???? Is it effective with small IAT? Like what many packed malware usually have (1 or 2 main APIs)

Mamdouh Alrekabi ????? ???????

Cyber Security Consultant eCIR | eCTHP | eCDFP I eCPPT | EJPT I CEH | CHFI | ECSA

3 个月

Very informative well done ?? Mohammed AlAqeel

要查看或添加评论,请登录

Mohammad AlAqeel的更多文章

社区洞察

其他会员也浏览了