Why The Cybersecurity Community Doesn’t Get AI

Why The Cybersecurity Community Doesn’t Get AI

Remember Morris?

For those that have grown up in the cybersecurity world (long before we called it cybersecurity), the Morris Worm is often recognized as the first self replicating malware when developed by Robert Tappan Morris in 1988.? Though very simple compared to today’s viruses, it began the pattern of creating malicious software code that moves from machine to machine to execute a wide range of nefarious actions.

In many ways, the entire cybersecurity community has been chasing ways to stop things such as the Morris Worm for the past 35 years.? Actions taken by security engineers range from keeping unauthorized users off networks and devices, scanning computers for undesirable software code, protecting computer memory from improper use and the list goes on and on.

We are now entering a new world, the world of artificial intelligence (AI), and what we have done in the past 35 years no longer applies.

Morris II

Researchers at Cornell University have created the first cross AI platform worm that infected multiple AI engines.? The researchers published it in a seminal paper titled Here Comes the AI Worm:? Unleashing Zero-click Worms that Target GenAI-Powered Applications [https://sites.google.com/view/compromptmized ].? The paper describes how the researchers successfully created a worm that could infect the foundational models of multiple AI engines such as Chat GPT and Gemini through a self replicating process.? The result not only had the AI engines creating incorrect and harmful outputs but the worm also uses AI capabilities to propagate across a wide range of other systems and AI engines.

As with the initial Morris worm, this is the beginning of the ongoing chase for security in the AI generation.

So what is so different about an AI worm versus traditional attacks we have been dealing with over the past 35 years?? The Morris II worm is completely driven by data, not by code running on devices, networks or in applications.? A critical paragraph in the paper reads:

?Data vs. Code. adversarial self-replicating prompts differs from regular prompts in the type of data they create. While a regular prompt is essentially code that triggers the GenAI model to output data, an adversarial self-replicating prompt is a code that triggers the GenAI model to output code. This idea resembles classic cyber-attacks that exploited the idea of changing data into code in order to carry their attack (e.g., an SQL injection attack that embeds code inside a query, or a buffer overflow attack that is intended to write data into areas known to hold executable code).

?The worm is using the GenAI foundational capability to create entirely new outputs to generate new malicious data.? There is no code being sent from system to system, there is data being sent from system to system.? Furthermore, this data is in the form of specific prompts that will generate threatening data by the GenAI.? The prompt itself may not be malicious but the answer from the engine is a threat.?

The other critical topic in the paper is a detailed discussion on the probability of success of the attacks.? Most readers will glaze over for this part of the paper because it is a mathematical discussion on how successful the worm was in infecting new systems.? The reason this is important is another fundamental difference of GenAI from traditional computing systems.? GenAI is? probabilistic in nature while traditional computing is deterministic.? What that means is that in traditional computing, an input of x returns an action of y….every time.? In GenAI systems, an input of x may return an action of y, z, w or a host of other results.? From a security perspective, this makes monitoring for specific inputs and outputs impossible.

What the Morris II worm reveals is that in the AI generation, the attacks on systems are going to be in the form of content data, dynamic questions and variable output.?

What the Cybersecurity Community Does Not Understand

The topic of cybersecurity in the GenAI world is obviously a hot topic.? I researched a number of standards, articles, papers and tools that are addressing this topic and found that all of the approaches looked at the issue from the lens of how we do cybersecurity today.? ?As an example, here is a section from the OWASP AI Security and Privacy Guide


?


As I reviewed numerous guidance documents, there is a common theme of protecting the systems and networks that will be using GenAI capabilities.? The new AI security tools that have entered the market are following a similar line where there is a new class of products providing governance over AI systems.? These tools align with more traditional IT and cybersecurity concepts of identifying the source and location of data and providing security by allowing or blocking data based on where it is coming from.

As shown by the Morris II worm, the problem is that there is no malicious data or code to block.? The prompt being asked of the GenAI engine may look perfectly viable but the output of the engine is malicious.? Furthermore, the output may be different from the engine each time but still cause harm.? Traditional cybersecurity approaches of trying to identify suspect code and to identify patterns do not apply in the GenAI world.

Why Retrieval Augmented Generation (RAG) Is A Cybersecurity Nightmare

While most people have heard GenAI terms such as Large Language Models (LLM) and training data, they likely have not have heard about RAGs.? RAGs are what give the GenAI world usability and context.? For example, when you tell your phone to send an email to all of my staff members working on client X, the RAG goes to the LLM to turn the language into a query, connects to your internal data sources, collects the responses, goes back to the LLM for an English language response and provides it to the user.?

There is probably not a more rapidly growing market in the IT world today than RAG generation tools and services.? From new start ups to well established companies such as DataBricks and Elastic, they are all providing new RAG tools and capabilities.?

The Morris II worm leveraged a RAG as a means to propagate from one GenAI engine to another by simply emailing it to them.? By creating a prompt for the LLM and using a RAG application, the model generated viable emails that were sent to other contacts.? In other words, it autogenerated emails that had prompts embedded into them that would cause the corruption of the LLM models.? Unlike phishing emails, however, there is no embedded code or link could be detected and blocked to stop the nefarious action from occurring.? It was just the words in the email.

As I stated earlier, the foundation of GenAI is data, or more specifically words, can create code.? RAG’s are the mechanism that makes this happen for various use cases.? Send an email, provide a dose of medicine, turn car to the right, all of these start with a simple sentence.? The challenge for the cybersecurity community is that there is an unlimited number of ways to ask the LLM to take an action and, likewise, an unlimited number of ways the LLM can create the code that makes the action occur.

I recently watched the demonstration of a RAG product that showed how their tool could take a prompt from a user asking about the number and nature of employees in a company.? The LLM had several back and forth questions with the user until it got the specific request desired and then created Python code that went to multiple Excel spreadsheets to gather the data and provide a response.? Not a single line of code existed until the LLM created it.?

From a cybersecurity perspective, there are no network or device security controls, the inputs and outputs have an unlimited number of patterns and unique code is being constantly generated.? All of these items are contrary to the foundation of today’s cybersecurity approaches.

The Scale and Speed Issue

The cybersecurity community has already been dealing with the struggle of large scale cyber attacks over the last few years.? Most of this has been due to the move by organizations to cloud based software as a service capabilities.? From MoveIT to Okta to Snowflake, a single vulnerability is suddenly turned into data losses for thousands of organizations.? Additionally, since many organizations also moved to the concept of large data stores, the breaches are in the millions of files and records.

Now take this scale issue the cybersecurity community is already struggling with and multiply it….by a million times.?

Review any projection of the amount of data being consumed by Microsoft, Google and others and the numbers are measured in trillions.? The RAG applications mentioned earlier are going to be ubiquitous across all markets, each one creating a different set of actions and responses.? The number of combinations between LLMs and RAGs is almost unlimited.? Furthermore, the state of technology in both of these areas is in its infancy and there will be new, and more automated, methodologies in the very near future.

Not Next Year…..Now

I have asked a number of security staff about the topic of securing GenAI and almost all have said they are starting to think about it but that they are going to wait until it becomes more mainstream.? At the same time, an ad pops up on my television advertising the new Salesforce feature where the user asks her phone to gather data and send it to her team.? That is a RAG performing those features.? All new Google phones are coming with the Genesis AI feature.? Almost every major product provider advertises their GenAI assistant.? Elon Musk’s Tesla just announced they have completed Phase 1 of their AI Supercluster.? One of the Superclusters has 100,000 Nvidia processors…..and was stood up in 122 days.

For the federal government, the US Agency for International Development just announced a partnership with Microsoft OpenAI.? Most people use some form of GenAI capability every day, from talking to your phone to the results received from an internet search.? Just watch the advertisements on television during a sporting event, they are dominated by companies espousing their AI features.

For those who have been in the IT world from almost it’s beginnings, there have been a number of confluences of technology where security could have been addressed but was not.? From the core elements of the internet to the lack of data tagging when service oriented architectures became popular, there were opportunities to build in security features.??? Unfortunately, the security community missed on several of these opportunities.? This is one of those times where there needs new thoughts before the GenAI world gets too far along that security can be injected.? New approaches such as file, or lower level, encryption, AI enabled testing agents, segmented RAG integration, all need to be addressed, and quickly.? Traditional approaches will not work, it is time for thinking differently.

Joseph Bauer

Director of Health Economics & Health Outcomes Research (HEOR), Department of Cancer Biostatistics, Levine Cancer Institute, Atrium Health Wake Forest Baptist

5 个月

Wow, just Wow! Unless there are drastic structural changes, I am not seeing how cybersecurity is going to get ahead of this technological tsusami to protect - basically everyone's private data. Also, as I have observed over several years - many institutions that the public/citizens have trusted with their data (particularly financial data) (even prior to AI) - have not kept pace with evolving cyber threats, likely because costs, and the frequency of getting hacked has increased. I am not seeing an agile response to preventing/detering future hacks. In fact, with the huge cyber attack on Change Healthcare (sub-contractor for the majority of health data in the country through United HealthCare) (ransomware) they just paid a 22 million dollar ransom (the cost of which was likely passed unto customers) - I have seen zero communications that they even bothered to change their business process/ upgraded cybersecurity . . . because they learned that because of the data breach they could delay payments for all of the medical claims by several months (thus being able to hold on and invest that money to make even more obscene amounts of money than they already were making). IMHO, I would not doubt that they would love to be hacked again!

回复

要查看或添加评论,请登录

Chip Block的更多文章