CyberSecurity + AI: defined, explained and explored
What’s lacking and what’s ahead.
In the ever-changing cyber threat landscape, cybersecurity is more vital than ever. Data bridges, hackers attacks, crashes and even more... According to Capgemini report, 42% of the companies had seen a rise in security incidents. But even without reports, we know well that now we have a sufficient amount of vulnerabilities needed to be smoothed out.
Artificial intelligence, in turn, promises to be a great solution for this. Two out of three organizations are ready to pay top dollars for strengthening cybersecurity with AI. But is AI ready for this? What is AI in the cybersecurity field? How much is hype in it?
Facts say that AI and machine learning are not widely leveraged in cybersecurity. They are acting more on the level of representative models and prototype systems. I have done broad foresight research about the current state of AI in cybersecurity and its future to bring you a more detailed picture. Let’s set aside all the sweetest words from marketers and take a closer look at things we really have.
How do cybersecurity systems work?
Today, cybersecurity systems can be divided into 2 types: expert (analyst-driven) and automated (machine-driven).
Expert systems are developed and managed by people, and the principle of their work is based on the recognition of threat signatures to prevent attacks. For example, malicious code or techniques that are used to identify and prevent cyberattacks — just like a fingerprint database is used to capture criminals.
Such an approach works well, but there is one drawback. Threat signatures can be recognized and entered into the “base” only after the attack has been completed. It is hard to prevent the same attacks in the future. Thus, such systems are not able to protect against previously unknown attacks called zero-day attacks.
Automated systems. Software identifies potentially harmful or dangerous actions in a system or network based on an analysis of historical data — that is, a typical classification problem solving — one of the basic problems of machine learning. Due to the fact that they act “ahead of the curve”, this approach can successfully deal with zero-day attacks.
What Machine learning can do for Cybersecurity
Here are some of the most vivid and successful examples of Machine Learning in cybersecurity:
#1 Automated Malware Defence
Let’s start with the most accomplished and ready-to-be-exploit solution. Conventional systems often fail to handle the sheer number of malware correctly every month. AI systems can be trained to identify even the smallest behaviors of ransomware and malware attacks before it enters the system and then isolate them from that system.
How does it work?
Let’s start with the problem. To identify potential threats, traditionally, it is needed to use signatures to check for the existence of a specific sequence of characters in the binary code of a program. But not always malware comes from binary code, so skilled hackers know ‘how to get away from a murder’.
To keep up with this, there is a behavior-based algorithm that doesn’t analyze the code directly but uses probability models to take into account multiple scenarios and attributes of malicious code. This approach has lots of drawbacks. Behavior-based algorithms lay behind because of its high price and ineffectiveness (sometimes it detects a threat when there is damage).
Heuristic algorithm, in turn, is a much more powerful weapon powered by AI. Based on a database of traits of both malicious and benign code, the AI involved attempts to make decisions about whether or not analyzed code is harmful.
Some traits may rank higher than others, so code that would later be classified as benign might have some traits that the software indicates as possible malware. And the main power is that just like any other machine learning algorithm, Heuristic algorithm can evolve and adapt.
How does it look at practice?
I found a great project called AI2 (the name comes from merging artificial intelligence with what the researchers call “analyst intuition). System predicts 85 percent of cyber-attacks using input from human experts. How?
AI2 combs through data and detects suspicious activity by clustering the data into meaningful patterns using unsupervised machine-learning. It fuses together three different unsupervised-learning methods, and then shows the top events to analysts for them to the label. It then builds a supervised model that it can constantly refine through what the team calls a ‘continuous active learning system’.
Human analysts receive this information and confirm which events are actual attacks, and incorporate that feedback into its models for the next set of data. Here is a short video on this:
If you want to go deeper, here is great research on how to stop advanced Advanced persistent threat (APT) malware by classifying the subroutines of the function call graph, usage of support vector machines and gaussian processes.
https://s3.amazonaws.com/envisioning/tdb/files/LfKXZAXRwKD94D8dk
#2 Automated Phishing Detection
Simulating trustworthy entity lots of phishing sites grabs your data like login, pass, number, and CV of your credit card and so on. Machine learning algorithms can serve a great helping function for destroying such a scheme once and for all.
ML can help through classifying messages similarly to email spam filters. The initial training data is crowd-sourced by users manually labeling messages or reporting suspicious links. As always, by dint of the process of constant learning, ML algorithms can improve accuracy.
Links to go deeper:
- A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
- This Dark Web Site Creates Robocalls to Steal People’s Credit Card PINs
- Spear Phishing & Social Engineering
- Collaborative phishing attack detection
#3 Automated Data Theft Detection
Data breaches are one of the most common threat vectors organizations are facing today. In order to alleviate problems like this, machine learning-based algorithms can be used to crawl through covert channels such as the deep or dark web and identify data that has been shared anonymously by malicious parties.
The last layer of the internet is the dark web. It’s more difficult to reach than the surface or deep web since it’s only accessible through special browsers such as the Tor browser.
Although the deep web is only accessible through anonymized encrypted peer-to-peer communication channels, certain safeguards like CAPTCHA need to be applied. AI, in turn, is necessary to fool these systems into believing that the agent which is collecting data is human and can range from solving simple captchas to using NLP to solicit invites to private communities of malicious parties. Using machine vision, images could be analyzed in real-time.
How does it work? For ML algorithm to be effective, it would need to:
- have the ability to detect different types of data elements (user-defined types, primitive types, the lineage of data transforms, hardcoded literates, annotated types, referencing identifier to environment data, etc)
- have the ability to classify these detected types as sensitive based on a supervised model using natural language processing that is trained upon a corpus of compliance mandates.
- track all transformations, lineage, and provenance of such sensitive types
- finally measure if such sensitive types are violating any current (SOC-2, GDPR) or forthcoming (CCPA) compliance constraints.
Links to go deeper:
- Startup Aims to Scour the Dark Web for Stolen Data
- Criminal motivation on the dark web: A categorisation model for law enforcement
#4 Context-aware Behavioral Analytics
Acted more like a concept or a model, context-aware behavioral analytics are founded on the premise that unusual behavior could precipitate an attack. This type of assessment is done through big data and machine learning to determine the risk of user activity in near real-time.
This approach is also called UBA that spells from User Behavior Analytics.
Why do we need this approach? Again, all of the security products are in the world of binary terms: traffic is bad or good, files are infected or not.. So how to detect smaller signs? Elaborating the standard pattern of normal user behavior helps to deal with this.
Since it is complicated to codify what behavior can be ‘normal’, ML models are trained to build baselines for each user by looking at the historical activity and making comparisons within peer groups. How does it work? In case of any abnormal events are being detected, a scoring mechanism aggregates them to provide a combined risk score for each user.
Users that have a high score are filtered out and presented to an analyst with contextual information along with their roles and responsibilities. Here is the formula for this:
risk = likelihood x impact
By following it, applications using UBA are able to provide actionable risk intelligence.
Links to go deeper:
- UEBA Market Worth USD 908.3 Million by 2021
- Securing the Modern Economy: Transforming Cybersecurity Through Sustainability
- Context‐aware movement analytics: implications, taxonomy, and design framework
#5 Honeypot-based Social Engineering Defence
Another one not a bad concept with great potential to be released soon.
By exploiting human psychology, attackers are able to obtain personal information in order to compromise security systems. Hardware and software alone cannot prevent these attacks. One possible countermeasure is utilizing social honeypots, fake persona decoys used to entrap attackers.
What is a honeypot? It is simply a trap that an IT pro lays for a malicious hacker, hoping that they’ll interact with it in a way that provides useful intelligence. It’s one of the oldest security measures in IT.
By acting as a decoy user, it tries to entrap attackers. Since all the communication with the honeypot is unsolicited, there is a high chance of the initial contract being spam. ML is used to classify whether the sender is malicious or benign. Such a classification is automatically then propagated to the devices of all real employees, which will then automatically block further communication attempts from the offending party.
Links to go deeper:
- What is a honeypot? A trap for catching hackers in the act
- Study of Automated Social Engineering, its Vulnerabilities, Threats and Suggested Countermeasures
- Social Engineering Attacks: A Survey
- This Dark Web Site Creates Robocalls to Steal People’s Credit Card PINs
Why can AI be a bad player for security?
All that glitters is not gold. We can apply cutting-edge technology for strengthening security, but there is also a dark side of this. Cybercriminals. They also can adopt these innovations and get an edge over cybersecurity defenses.
#1 Simulating faces and voices
Through the new developments in neural networks and speech synthesis, an attacker could emulate a trusted voice or a video. So far, these are relatively “innocent pranks” using the faces of famous actors in indecent content videos. With the development of technology, this can lead to a colossal scale of littering the global network of fakes and to the appearance of fakes that are extremely difficult to distinguish from real news — elaborated, politically motivated, capable of causing economic or social consequences.
More precisely, advancements in Natural Language Processing and conversational bots could allow a malicious chatbot to detect customer complaints online, and then pose as a customer service representative attempting to remedy the situation. The consumer may unwillingly hand over sensitive data like responses to security questions, passwords, and more. This could evolve into more sophisticated phishing and spear-phishing emails, targeting victims by mimicking corporate writing styles or even someone’s personal writing style.
#2 AI-driven malware
Another drawback is that hackers can also use AI themselves to test their malware and improve and enhance it to potentially become AI-driven. In fact, AI-driven malware can be extremely destructive as they can learn from existing AI tools and develop more advanced attacks to be able to penetrate traditional cybersecurity programs or even AI-boosted systems.
Conclusions
The future of cybersecurity + AI looks promising. Today we already have a ready-to-be-exploit solution in the face of the basic tools providing Automated Malware Defence. Besides, we have a whole range of concepts and ideas being developed and expecting its release soon. On the other side, information security has always been a cat and mouse game. The good guys build a new wall, and the bad guys — cybercriminals — figure out a way over it, under it, or around it. This, by the way, makes developments of AI-driven solutions for cybersec even more complicated.
What do you think?
………………………
If you do anything cool with this information, leave a response in the comments below or reach out at any time on my Instagram and Medium blog, also welcome to visit my Linkedin page.