Hey Malware, I Know When You Came In! An Effort Towards Early Detection of Malware
Sareena K P
Senior Technical Manager, Cyber Security, at NEC Corporation India Pvt. Ltd.
On a usual Thursday afternoon, in the middle of our regular research discussion, my advisor got a frantic call from our institute computer centre. On reaching the centre, we found multiple servers displaying an ominous message with cross-bones on their screen: “All your files are ENCRYPTED!”; further asking for ransom in return for our own data. None of the data in the servers was in a readable form. So was the case with other servers hosting institute portals. It was evident — we were facing a ransomware attack[1]! Since we work on cyber-security and malware analysis, our inputs were solicited to analyse the possibilities of recovering the data and restricting the spread of the malware. The malware had already spread to the personal computers of scholars and campus residents. The only thing we could do was disconnect and isolate all infected computers.
The invincible weapon
Malware (e.g. virus, worm) is a software designed to disrupt, damage, or gain unauthorized access to a computer. It is an invincible weapon in the cyber-world that can take many forms. In the case of the IITM attack, it was a hostage-taker. It can also be a spy, burglar (stealing password/credentials/), or one that can create backdoors in a system, assisting the attackers for any crime of their choice. The ramifications of its onslaughts vary but are always disastrous for the victim, be it a regular user, enterprises, industries, government agencies or the nation itself. For instance, in 2017, the Wannacry attack crippled computer systems in more than 150 countries worldwide with an estimated financial loss of $4 billion. Similarly, the Ukraine power grid attack in 2015 led to a blackout of 6 hours, affecting 230 thousand people.
So why did such attacks, similar to the IITM attack, go undetected despite the presence of state-of-the-art Antivirus software? Forensics revealed that they were zero-day attacks, i.e. the malware exploited a software vulnerability (weakness) that was unknown before. Essentially, the state-of-the-art antivirus software did not have any mechanism to detect and block the malware. We were barehanded in front of the attacker! The damage was done! There was no guarantee that the data would be recoverable even after paying the ransom. While the centre restored the data from previous backups, we were pondering what we could do to prevent the attack from happening in the first place.
With the far-reaching impact and prowess of zero-day, malware is for sure one step ahead in the cat-and-mouse chase between security researchers and malware. But, malware attacks seldom happen overnight. Most attacks are a culmination of months-long effort of the malware, from the point of entering the network to hitting the victim right at its weakest but most critical spot [2].?
Most attacks are a culmination of months-long effort of the malware, from the point of entering the network to hitting the victim right at its weakest but most critical spot.?
What-How-When Strategy
Past incidents show that all high-impact malware learnt with precision it’s what-how-when strategy, i.e. what to hit (in the victim), how to hit, and when to hit, to cause maximum damage from its onslaught. For this, after entering the target network, the malware undergoes an incubation period of days or even months before it shows any symptoms (performing the attack). In this period, while remaining stealthy, the malware surveys and learns the environment to identify weak spots (vulnerabilities) to exploit. Over time, it formulates its what-how-when strategies to ensure its success rate. But given the time the malware takes to formulate its strategy, can we as defenders turn the tables in our favour? Can we decode its incubation period to find some early indicators of its presence in our system? Can we use these insights to enhance our Antivirus systems to warn and block malware earlier automatically??These are the questions that we are seeking answers to in our research. Such warning systems can enable precautionary measures to minimize the damage from such attacks. In fact, our results show the presence of such early indicators [3]. But how early can we detect??
Decoding the incubation period
Figure 1 explains the life cycle of malware and its incubation period. Analogous to biological viruses, malware enters your system through loopholes. It can be spam email (as in the IITM case) or clicking a shady URL, or an infected pen-drive. On entry, it first registers itself with its command-and-control (C&C) server, a remote machine elsewhere in the world that acts as a commander instructing the malware at every step it takes. The C&C monitors the health of malware using its heartbeats which is a regular communication from the malware to its C&C, indicating its well-being. This ensures that the C&C gives the next steps only if the malware is successful in remaining stealthy.
Second, it discovers the environment by gathering information about the users and machines in the network while attempting to spread. It collects user credentials and assesses the criticality of systems in the network. For instance, the malware can identify a server hosting a website as critical by determining the number of requests/connections to the server. Third, it reports the collected information to its C&C. Based on inputs from the malware, the C&C decides the time and target of the attack. Fourth, the malware receives the steps to carry out the attack from the C&C. Finally, on a fateful day and time, the malware attacks by executing the received instructions. In the IITM case, it encrypted the files, deleted the original files and finally displayed the ransom message.?
Most modern-day malware follows these stages in their incubation period while differing in information gathered and mechanisms used to remain stealthy. In every stage, it accesses some services of the operating system (OS). For example, in the discovery stage, it collects the user names in the system using an OS system call. Thus, malware behaviour leaves some OS and network footprints, forming potential fodder for researchers to aid in detecting malware, but with multiple challenges.
领英推荐
Good versus Bad!
A malware tries its best to remain stealthy by camouflaging its OS and network activities as activities of any benign application. For instance, a connection to the C&C and heartbeats are potential indicators of malware. However, most benign software today are cloud-based and connects to their online server during execution. How do we differentiate malware connecting to their C&C from benign software connecting to the cloud? The challenges increase as attackers typically use public clouds like Amazon to host their C&C, which is also the popular hosting service among benign applications. To hide heartbeats, the malware randomizes and spreads its communication over time to confuse the defender of the periodicity of the heartbeat. Further, a growing proportion of malware uses the HTTPS protocol to encrypt all network communications. Thus, what is visible to the defender is only a sequence of meaningless bytes.
Given these challenges, to differentiate good versus bad, we need a comprehensive understanding of malware behaviour across the malware stages in the wild, which is not available today. We first address this problem by building a testbed of 500 devices that mimics the behaviour of a real organizational network. We then run a large number (10,000) of malware samples of various types (ransomware, spyware, backdoor etc.) on the testbed and collect related network communication and OS logs. We next analyse them to identify early indicators of malware presence. Our results indicate that network features like the number of domain-name query failures, and periodicity of communication are potential indicators as shown in Figure 2. Implementation of these features is also feasible, as the run time overheads of checking them are minimal.?
Next, on the data, we run complex models that process not only individual events, but also a sequence of events to correlate the malware behaviour over time. Our models could detect malware as early as from the first four packets of a malware communication with a false positive rate (FPR) <10%.
Our models could detect malware as early as from the first four packets of a malware communication with a false positive rate (FPR) <10%.
FPR indicates the number of benign applications mistakenly detected and blocked as malware. While higher FPR can be annoying to the users, it is an indispensable side effect of early warning systems. The earlier we attempt to detect malware, the higher FPR is, as the malware behaviour is very similar to benign applications. We can address this using a combination of network and OS behaviour in our models. However, the runtime overheads and operating costs of collecting/using them are high.?
This brings us to the classic three-way tug-of-war between security, user-experience and operating cost. How do we secure our systems, while maintaining an optimal trade-off between these three? How far can we push our capability to detect early? These questions keep us going in our research, while the cat-and-mouse chase continues ...
"A Stitch in Time Saves Nine!!"
References
[1] https://timesofindia.indiatimes.com/city/chennai/iit-m-servers-under-ransomware-attack-email-services-down/articleshow/74216771.cms
[2] Manos ? Antonakakis, ? Tim ? April, ? Michael ? Bailey, ? Matt ? Bernhard, ? Elie Bursztein, Jaime Cochran, Zakir Durumeric, J Alex Halderman, Luca Invernizzi, Michalis Kallitsis, et al. Understanding the Mirai botnet In USENIX Security Symposium 2017
[3] Karapoola, Sareena, Chester Rebeiro, Unnati Parekh, and Kamakoti Veezhinathan. "Towards Identifying Early Indicators of a Malware Infection." In Proceedings of the 2019 ACM Asia Conference on Computer and Communications Security, pp. 679-681. 2019.
PHD at Indian Institute of Technology, Kharagpur
3 年really nice work ??
Carbon Nano Particle Target Delivery Tool Research
3 年very informative