登录查看更多内容

CyberSecurity + AI: defined, explained and explored

Oleksii Kharkovyna

MPA, PhD, SAP solution architect

发布日期: 2020年3月1日

What’s lacking and what’s ahead.

In the ever-changing cyber threat landscape, cybersecurity is more vital than ever. Data bridges, hackers attacks, crashes and even more... According to Capgemini report, 42% of the companies had seen a rise in security incidents. But even without reports, we know well that now we have a sufficient amount of vulnerabilities needed to be smoothed out.

Artificial intelligence, in turn, promises to be a great solution for this. Two out of three organizations are ready to pay top dollars for strengthening cybersecurity with AI. But is AI ready for this? What is AI in the cybersecurity field? How much is hype in it?

Facts say that AI and machine learning are not widely leveraged in cybersecurity. They are acting more on the level of representative models and prototype systems. I have done broad foresight research about the current state of AI in cybersecurity and its future to bring you a more detailed picture. Let’s set aside all the sweetest words from marketers and take a closer look at things we really have.

How do cybersecurity systems work?

Today, cybersecurity systems can be divided into 2 types: expert (analyst-driven) and automated (machine-driven).

Expert systems are developed and managed by people, and the principle of their work is based on the recognition of threat signatures to prevent attacks. For example, malicious code or techniques that are used to identify and prevent cyberattacks — just like a fingerprint database is used to capture criminals.

Such an approach works well, but there is one drawback. Threat signatures can be recognized and entered into the “base” only after the attack has been completed. It is hard to prevent the same attacks in the future. Thus, such systems are not able to protect against previously unknown attacks called zero-day attacks.

Automated systems. Software identifies potentially harmful or dangerous actions in a system or network based on an analysis of historical data — that is, a typical classification problem solving — one of the basic problems of machine learning. Due to the fact that they act “ahead of the curve”, this approach can successfully deal with zero-day attacks.

What Machine learning can do for Cybersecurity

Here are some of the most vivid and successful examples of Machine Learning in cybersecurity:

#1 Automated Malware Defence

Let’s start with the most accomplished and ready-to-be-exploit solution. Conventional systems often fail to handle the sheer number of malware correctly every month. AI systems can be trained to identify even the smallest behaviors of ransomware and malware attacks before it enters the system and then isolate them from that system.

How does it work?

Let’s start with the problem. To identify potential threats, traditionally, it is needed to use signatures to check for the existence of a specific sequence of characters in the binary code of a program. But not always malware comes from binary code, so skilled hackers know ‘how to get away from a murder’.

To keep up with this, there is a behavior-based algorithm that doesn’t analyze the code directly but uses probability models to take into account multiple scenarios and attributes of malicious code. This approach has lots of drawbacks. Behavior-based algorithms lay behind because of its high price and ineffectiveness (sometimes it detects a threat when there is damage).

Heuristic algorithm, in turn, is a much more powerful weapon powered by AI. Based on a database of traits of both malicious and benign code, the AI involved attempts to make decisions about whether or not analyzed code is harmful.

Some traits may rank higher than others, so code that would later be classified as benign might have some traits that the software indicates as possible malware. And the main power is that just like any other machine learning algorithm, Heuristic algorithm can evolve and adapt.

How does it look at practice?

I found a great project called AI2 (the name comes from merging artificial intelligence with what the researchers call “analyst intuition). System predicts 85 percent of cyber-attacks using input from human experts. How?

AI2 combs through data and detects suspicious activity by clustering the data into meaningful patterns using unsupervised machine-learning. It fuses together three different unsupervised-learning methods, and then shows the top events to analysts for them to the label. It then builds a supervised model that it can constantly refine through what the team calls a ‘continuous active learning system’.

Human analysts receive this information and confirm which events are actual attacks, and incorporate that feedback into its models for the next set of data. Here is a short video on this:

If you want to go deeper, here is great research on how to stop advanced Advanced persistent threat (APT) malware by classifying the subroutines of the function call graph, usage of support vector machines and gaussian processes.

https://s3.amazonaws.com/envisioning/tdb/files/LfKXZAXRwKD94D8dk

#2 Automated Phishing Detection

Simulating trustworthy entity lots of phishing sites grabs your data like login, pass, number, and CV of your credit card and so on. Machine learning algorithms can serve a great helping function for destroying such a scheme once and for all.

ML can help through classifying messages similarly to email spam filters. The initial training data is crowd-sourced by users manually labeling messages or reporting suspicious links. As always, by dint of the process of constant learning, ML algorithms can improve accuracy.

Links to go deeper:

A new hybrid ensemble feature selection framework for machine learning-based phishing detection system
This Dark Web Site Creates Robocalls to Steal People’s Credit Card PINs
Spear Phishing & Social Engineering
Collaborative phishing attack detection

#3 Automated Data Theft Detection

Data breaches are one of the most common threat vectors organizations are facing today. In order to alleviate problems like this, machine learning-based algorithms can be used to crawl through covert channels such as the deep or dark web and identify data that has been shared anonymously by malicious parties.

The last layer of the internet is the dark web. It’s more difficult to reach than the surface or deep web since it’s only accessible through special browsers such as the Tor browser.

Although the deep web is only accessible through anonymized encrypted peer-to-peer communication channels, certain safeguards like CAPTCHA need to be applied. AI, in turn, is necessary to fool these systems into believing that the agent which is collecting data is human and can range from solving simple captchas to using NLP to solicit invites to private communities of malicious parties. Using machine vision, images could be analyzed in real-time.

How does it work? For ML algorithm to be effective, it would need to:

have the ability to detect different types of data elements (user-defined types, primitive types, the lineage of data transforms, hardcoded literates, annotated types, referencing identifier to environment data, etc)
have the ability to classify these detected types as sensitive based on a supervised model using natural language processing that is trained upon a corpus of compliance mandates.
track all transformations, lineage, and provenance of such sensitive types
finally measure if such sensitive types are violating any current (SOC-2, GDPR) or forthcoming (CCPA) compliance constraints.

Links to go deeper:

Startup Aims to Scour the Dark Web for Stolen Data
Criminal motivation on the dark web: A categorisation model for law enforcement

#4 Context-aware Behavioral Analytics

Acted more like a concept or a model, context-aware behavioral analytics are founded on the premise that unusual behavior could precipitate an attack. This type of assessment is done through big data and machine learning to determine the risk of user activity in near real-time.

This approach is also called UBA that spells from User Behavior Analytics.

Why do we need this approach? Again, all of the security products are in the world of binary terms: traffic is bad or good, files are infected or not.. So how to detect smaller signs? Elaborating the standard pattern of normal user behavior helps to deal with this.

Since it is complicated to codify what behavior can be ‘normal’, ML models are trained to build baselines for each user by looking at the historical activity and making comparisons within peer groups. How does it work? In case of any abnormal events are being detected, a scoring mechanism aggregates them to provide a combined risk score for each user.

Users that have a high score are filtered out and presented to an analyst with contextual information along with their roles and responsibilities. Here is the formula for this:

risk = likelihood x impact

By following it, applications using UBA are able to provide actionable risk intelligence.

Links to go deeper:

UEBA Market Worth USD 908.3 Million by 2021
Securing the Modern Economy: Transforming Cybersecurity Through Sustainability
Context‐aware movement analytics: implications, taxonomy, and design framework

#5 Honeypot-based Social Engineering Defence

Another one not a bad concept with great potential to be released soon.

By exploiting human psychology, attackers are able to obtain personal information in order to compromise security systems. Hardware and software alone cannot prevent these attacks. One possible countermeasure is utilizing social honeypots, fake persona decoys used to entrap attackers.

What is a honeypot? It is simply a trap that an IT pro lays for a malicious hacker, hoping that they’ll interact with it in a way that provides useful intelligence. It’s one of the oldest security measures in IT.

By acting as a decoy user, it tries to entrap attackers. Since all the communication with the honeypot is unsolicited, there is a high chance of the initial contract being spam. ML is used to classify whether the sender is malicious or benign. Such a classification is automatically then propagated to the devices of all real employees, which will then automatically block further communication attempts from the offending party.

Links to go deeper:

What is a honeypot? A trap for catching hackers in the act
Study of Automated Social Engineering, its Vulnerabilities, Threats and Suggested Countermeasures
Social Engineering Attacks: A Survey
This Dark Web Site Creates Robocalls to Steal People’s Credit Card PINs

Why can AI be a bad player for security?

All that glitters is not gold. We can apply cutting-edge technology for strengthening security, but there is also a dark side of this. Cybercriminals. They also can adopt these innovations and get an edge over cybersecurity defenses.

#1 Simulating faces and voices

Through the new developments in neural networks and speech synthesis, an attacker could emulate a trusted voice or a video. So far, these are relatively “innocent pranks” using the faces of famous actors in indecent content videos. With the development of technology, this can lead to a colossal scale of littering the global network of fakes and to the appearance of fakes that are extremely difficult to distinguish from real news — elaborated, politically motivated, capable of causing economic or social consequences.

More precisely, advancements in Natural Language Processing and conversational bots could allow a malicious chatbot to detect customer complaints online, and then pose as a customer service representative attempting to remedy the situation. The consumer may unwillingly hand over sensitive data like responses to security questions, passwords, and more. This could evolve into more sophisticated phishing and spear-phishing emails, targeting victims by mimicking corporate writing styles or even someone’s personal writing style.

#2 AI-driven malware

Another drawback is that hackers can also use AI themselves to test their malware and improve and enhance it to potentially become AI-driven. In fact, AI-driven malware can be extremely destructive as they can learn from existing AI tools and develop more advanced attacks to be able to penetrate traditional cybersecurity programs or even AI-boosted systems.

Conclusions

The future of cybersecurity + AI looks promising. Today we already have a ready-to-be-exploit solution in the face of the basic tools providing Automated Malware Defence. Besides, we have a whole range of concepts and ideas being developed and expecting its release soon. On the other side, information security has always been a cat and mouse game. The good guys build a new wall, and the bad guys — cybercriminals — figure out a way over it, under it, or around it. This, by the way, makes developments of AI-driven solutions for cybersec even more complicated.

What do you think?

………………………

If you do anything cool with this information, leave a response in the comments below or reach out at any time on my Instagram and Medium blog, also welcome to visit my Linkedin page.

要查看或添加评论，请登录

查看全部

CyberSecurity + AI: defined, explained and explored

Oleksii Kharkovyna

MPA, PhD, SAP solution architect

What’s lacking and what’s ahead.

How do cybersecurity systems work?

What Machine learning can do for Cybersecurity

Why can AI be a bad player for security?

Conclusions

更多精彩文章

社区洞察

其他会员也浏览了

How Artificial Intelligence Supercharges Cybersecurity | How AI Takes Protection to the Next Level

Navigating the Convergence of AI and Cybersecurity: Strategies for Next-Gen Threat Intelligence

AI in Cybersecurity

The Future of AI in Cybersecurity: Trends to Watch

AI in Cybersecurity: Navigating the Double-Edged Sword

The Impact of AI and Machine Learning on Kill-Chain Analysis

Cybersecurity - Artificial Intelligence and Machine Learning Cyber Attacks

Cybersecurity Faces a New Era with Artificial Intelligence – Insights from Sequretek's CEO Pankit Desai

Navigating the New Frontier: AI's Transformative Role in Cybersecurity

What’s lacking and what’s ahead.

How do cybersecurity systems work?

What Machine learning can do for Cybersecurity

Why can AI be a bad player for security?

Conclusions

The Ultimate Glossary of Data Science

2020年6月18日

Deep Learning Algorithms — The Complete Guide

2020年6月16日

AI and SAP solutions: Why should small companies step up the game

2020年6月11日

Exploratory data analysis for Linear Regression & Classification

2020年5月15日

Top 20 Machine Learning & Data Science Websites To Follow in 2020

2020年4月19日

Top 15 books to make you a Deep Learning Hero

2020年3月22日

A Gentle Intro to Probability and Statistics for Data Science

2020年3月9日

Mathematics Behind AI & Machine Learning

2020年2月23日

From Zero to Hero: Forming Strong Data Science Team

2020年2月12日

Natural Language Processing (NLP): Top 10 Applications to Know

2020年1月26日

社区洞察

其他会员也浏览了

How Artificial Intelligence Supercharges Cybersecurity | How AI Takes Protection to the Next Level

Navigating the Convergence of AI and Cybersecurity: Strategies for Next-Gen Threat Intelligence

AI in Cybersecurity

The Future of AI in Cybersecurity: Trends to Watch

AI in Cybersecurity: Navigating the Double-Edged Sword

The Impact of AI and Machine Learning on Kill-Chain Analysis

Cybersecurity - Artificial Intelligence and Machine Learning Cyber Attacks

Cybersecurity Faces a New Era with Artificial Intelligence – Insights from Sequretek's CEO Pankit Desai

Navigating the New Frontier: AI's Transformative Role in Cybersecurity