Generative AI and Cybersecurity Risks
I asked Firefly, Adobe's Generative AI tool that converts text into image with input prompt "Cybersecurity risks of Gen AI"

Generative AI and Cybersecurity Risks

My first two articles on Gen AI highlighted the need for a cautious approach towards its adoption for enterprise leaders and the impact that technology will have on the economy and jobs. This article will focus on cybersecurity considerations as enterprises embark on the Gen AI adventure.?

Article 1: Generative AI Revolution, and its Implications for Utilities and at Large

Article 2: Generative AI and its Impact on the Future of Jobs and IT


We live in a world where we trust our computers more than ourselves! And AI makes threat actors more worthy adversaries to Cybersecurity professionals.


My path to becoming a cybersecurity professional was unexpected. It all began when we experienced a complicated cyber-attack on one of our customer's mission-critical information systems. We suspected a threat actor had breached our network defenses and gained access to sensitive PII and PCI information. At the time, I was on an internal audit assessment team at the customer's site as part of the annual information systems auditing effort. I was drawn into the cyber incident response process to check the integrity of internal and external systems controls to identify the indicators of compromise. The cyber incident sparked my interest in pursuing Cybersecurity as one of my professional passions. For over a decade since then, one of my professional responsibilities included developing enterprise InfoSec strategies, designing policies and processes to secure enterprise data systems, and implementing InfoSec tools to strengthen the cybersecurity posture for Fortune 250 enterprises. Through the years, I have also had the opportunity to learn from and get mentored by some of the brightest minds in the Energy industry, which has been a great privilege.

Cybersecurity has come a long way in the past two decades. What was once a simple information systems risk management practice has become a complex field that now includes the psychology behind human behavior. The best hackers use a combination of psychological science and weaknesses in automated processes, controls, and technology to find the weak link in the cyber kill chain defenses and stage an attack. This is why Cybersecurity is a profession driven by a "Negative goal-setting model." The interesting thing about this approach is that the CISOs I've talked to about it are split in their agreement or disagreement. When we have a positive goal, we have a clear target in mind and can create a roadmap to achieve it, leading to a positive outcome. However, a negative goal doesn't have a specific target. Instead, we strategize based on weighted risks that predict potential cyber incidents that may not have occurred yet or have yet to be discovered. This leads us to chase several targets at once, ensuring that there are no gaps in the process or technology that can be leveraged by adversaries to compromise the confidentiality, integrity, and availability of data.

When it comes to cybersecurity, understanding the difference between positive and negative goals can make all the difference. Let's take Jane Doe as an example. If she wants to access a specific file, the positive goal is to provide her with the capability to do so safely. On the other hand, if we want to restrict access to that file using various security tools, we are pursuing a negative goal. While this may seem counterintuitive, it ultimately leads to a positive outcome: protecting computer systems and networks from theft, damage, or unauthorized access.

Cybersecurity professionals work tirelessly to create programs and tools that limit access to files and data by anticipating all possible ways they could be accessed. Positive goals are easier to implement because they are specific to a particular entity, but negative goals require a broader approach. Every new "technology" is eventually a machine code, and anything that can be built can be broken. Some argue that this approach can lead to an inaccurate picture of an organization's security posture, making cybersecurity a complex concept to implement and measure.

This complexity is one reason why the ROI on cybersecurity investments can be difficult to explain. This is compounded by the fact that Cyber Liability Insurance agreements often protect insurance companies from claims resulting from the use of emerging technologies to compromise enterprise security posture. In 99% of the cases, the real financial and legal impact comes to light only after an attack has been discovered. Hence CISOs must go to great lengths to justify the business case for securing assets because the outcomes are probabilistic in nature. But despite the challenges, it's crucial to prioritize cybersecurity and invest in the proper tools and programs to protect your organization from potential threats. By understanding the difference between positive and negative goals and working towards a positive outcome, you can help keep your enterprise safe and secure.

As cybersecurity threats continue to increase in complexity, it can be challenging to justify the ROI on investments in this area. CISOs must work hard to build a business case for securing assets, despite the probabilistic nature of outcomes. However, it's essential to prioritize cybersecurity and invest in the right tools and programs to safeguard your organization against potential threats.

Gen AI has exponentially increased the complexity of securing the enterprise's information processing assets by the CISOs globally; while the business is drawn closer by Gen AI's hallucinating capabilities, CISOs are drawn into uncharted waters trying to understand the implications of Gen AI on Cybersecurity. It has significantly altered the threat landscape because it is accessible to all, easy to use and understand, significantly boosts individual productivity using its ability to produce novel content with minimal effort, and mass integrations with third-party software using the APIs.

As of writing this, most publicly available open-source LLMs can't take real-time inputs to craft another's response to a query. Most current LLMs require them to be trained and finetuned before they can be mass deployed, and to do so, they need extensive hyper-scale capabilities, which is expensive. To put this in perspective, In a recent research paper released by Meta AI on the LLaMA model shared that for training the model on 32K vocabulary size with 65B parameters and 1.4T tokens, they used 2,048 A100 GPUs for 21 days, costing them $5M. Things get ugly when we consider the EGS impact from running these many numbers of GPUs for 21 days.?

When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days. Source: Meta AI research

As with any technology, initial adoption can be costly, but commoditization will increase efficiency to optimize the innovation for mass adoption. QLoRA, a new approach to finetuning the Quantized LLMs, was announced on 5/23/23. This is expected to consume less CPU and GPU memory, potentially paving the way for Gen AI models to run on fewer GPUs and CPUs, thereby raising the possibility of LLM running on an enterprise-grade infrastructure, making it within reach of the large enterprises, research institutions, as well as state-sponsored cyber attacks. APTs?are highly detailed, organized AI-driven cybercrime attacks planned by actors, individuals, and nations state-sponsored groups. So far, the most damaging APT attacks have been discovered to be originating from powerful nation-state nations such as Russian and China, both of which have tremendous access to the cash and technology infrastructure to host purpose-built LLMs.

The QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit fine tuning task performance. Source: Cornell University

Artificial intelligence has played a vital role in modern cyber attacks. Although AI technology has been essential in creating safeguards to stop these attacks, it has also been used to make them even more vicious and complex to identify. Many hackers use social engineering techniques with AI to exploit targets more effectively.

So, what are the cybersecurity risks from Gen AI, and how can the CISOs take some proactive actions?

In the Risk Probability vs. Impact matrix below, most risks currently are clustered in the MEDIUM-HIGH quadrants, the reasons being that the exact impact on the enterprise from the threats is still being assessed and understood as there are no definitive reference points which have only started to emerge recently. Over time as the enterprises and CISOs learn from the experiences, the arrows depict the indicative directionality of these risks. Risk perception changes, and so does its treatment as more information surfaces about the technicalities of the threats emerge, highlighting the nature of the Gen AI impact on the cybersecurity posture. These are relative risk perceptions based on the current trends in Gen AI adoption and are expected to change as the adoption and usage increase. As more enterprises jump on the bandwagon, the visibility of threats on the attack surface areas will come to the foreground. Some of the notable Cybersecurity risks dimensions that are being pursued by CISOs are centered around the following:

  • Enterprise data privacy and confidentiality – Given the personalized nature of Gen AI creating value for the users, enterprises may be initially tempted to train the model with customer or employee information creating PII data privacy concerns. CISOs must guide development teams and business leaders on what type of data must be used for training the LLMs.
  • Threat actor complexity (External) – We are in the early stages of understanding how the threat actors will leverage the Gen AI tool to stage an attack. Once Gen AI-enabled products and services get prevalent, we will witness a new species of vulnerabilities that most CISOs have to deal with. It is essential to understand AI attack surface maps. With the LLMs becoming public, Langchain has been released by several development platforms. They integrate software development platforms and programming languages. LangChain is an application development framework that can leverage language models. Most future applications will likely use LangChain because they are

  1. Data-aware as they integrate a language model into other data sources, and
  2. Agentic, allows a language model to interact with its environment, including humans and intelligent bots.
  3. LangChain integrates with several different LLMs, systems, and products that can create Synchronous and Asynchronous interactions, potentially creating a diverse and thriving ecosystem of multiple LLMs. Some common use cases of the LangChain include Autonomous agents (They are long-running agents that take many steps to achieve an objective, such as AutoGPT and BabyAGI.), Agent simulations (Putting agents in a sandbox environment to observe their mutual interactions and reacting to events can help attackers to evaluate and understand their long-range reasoning abilities.), Personal Assistants (Bots that need to take action, remember interactions, and have access to your data.), Targeted querying (Answering questions over specific documents, using the information in documents to construct a response.), Chatbots (Language models are famous because they love to chat, creating a hallucination effect and resulting in an addictive habit to using Gen AI powers bots, Data Extraction (extraction from structured information from text.), and Data Summarization (Compressing longer documents. A type of Data-Augmented Generation.), Evaluation (Gen AI models are hard to evaluate with traditional metrics, one approach is to use language models themselves to do the evaluation.), and API interactions?(Enabling language models to interact with APIs is extremely powerful as it requires access to current information and allows them to take actions.). The above use cases create opportunities for Prompt injection and Training attacks. In Prompt Injection, you use your knowledge of backend systems, or AI systems in general, to attempt to construct input that makes the receiving system do something unintended that benefits you. Examples include bypassing the system prompt, executing code, pivoting to other backend systems, etc. On the other hand, Training Attacks can technically come via prompt Injection, but this is a class of attack whose purpose is to contaminate training data so that the model generates worse, broken, or somehow attacker-positive outcomes. Examples include when you inject a ton of content about the best tool for doing a given task, so anyone who asks the LLM later gets pointed to your solution.

  • Regulatory and Legal Implications – Several countries globally have introduced data policies, protection laws, and regulations that most businesses must comply with. Enterprises using customer information could face potential exposure to the regulatory and legal implications that must be considered before using the data to train LLMs. It is recommended that CISOs work with their legal counterpart to understand regulation aspects of Gen AI usage in customer-facing use cases.?
  • Infringements of the copyrights, ownership, and Intellectual property – When Gen AI was initially released to the public, one of the immediate concerns raised by the media was about the infringement of the copyrights, content ownership, and intellectual property. ChatGPT-4 has been trained with some 300 billion words systematically scraped from the internet, such as digital books, articles, websites, and posts – including personal information obtained without consent. Entperises where employees use open-source Gen AI tools, could be subjected to potential infringement risks. The US Copyright Office has published guidance on how to treat the content generated using Gen AI tools.?

The guidance clarifies that works generated by AI technology in response to human prompts, where the AI system executes the "traditional elements of authorship" (i.e., the expressive elements of the output) and the human does not exercise sufficient creative control over how the prompts are interpreted, will not be protectable by copyright.?Source: Copyright Office, Library of Congress on 03/16/2023

  • Gen AI Algorithmic bias – AI algorithms are known for their biased outputs as they depend on the data used to train the algorithms. They are also static because they cannot adapt to new data or update their knowledge in real-time without re-training. Hence the lack of processes and controls for data sensitization could result in significant cost overruns in training the LLMs multiple times to create unbiased outputs. Biased responses from the Gen AI tools could cause discriminatory and defamatory issues for enterprises. For example, if an LLM algorithm were trained on only customer interaction data without gender information, it would be unlikely to identify gender-neutral words that pertain to the user accurately. AI suffers from two commonly occurring biases data bias and societal biases. Data-based biases occur when LLM algorithms use training datasets that are already biased from values that are inaccurate representations of reality (for example, a text-to-image Gen AI tool generating the human face with a fair complexion by default, which could trigger racial and ethnicity issues). Societal AI bias, on the other hand, occurs due to our assumptions and norms as a society causing us not to see certain aspects correctly—technologies generally don't factor in complex cultural norms while developing software codes.?They are also static because they cannot adapt to new data or update their knowledge in real-time without re-training.
  • Brand reputation, trust, and loyalty – The current circulation LLMs are observed to generate incorrect content, which can contain misleading and wrong information. Using inaccurate information could risk the brand's reputation, customers' trust, and loyalty. The image created using text-to-image Generative AI, used as the cover image in this article, has an incorrect spelling of Cybersecurity risks. I used it purposefully to highlight the dangers of Generative AI.
  • Exposure of Software security vulnerabilities – Using the Gen AI to generate software code could expose existing and create new vulnerabilities if the IT organization does not have good processes which ensure that systems are up-to-date with current patch releases and software updates. Attention must be paid to those practices where IT organization employs early-stage developers to maintain production systems, as developers with less experience tend to take a shortcut approach to coding. It is recommended that a safe playground is created for the software development team that intends to use Gen AI to generate and use the code. In addition, CISOs must institutionalize policies that can help identify the AI-generated code from human-created code and update the SDLC processes with application security testing to ensure AI-generated software code undergoes a stringent quality assurance process before being deployed into production.
  • Costs and Investment risks – As highlighted earlier in this article, those enterprises setting up their instances of Gen AI LLMs on-premise could be cost-prohibitive and time-consuming. Hence it is essential to start small using a cloud-hosted service to assess the ROI and benefits of the technology. Business leaders must partner with CIO and CISO to ensure sufficient controls are in place to monitor accidental exposure of sensitive data to LLMs. Gen AI is in its nascent stage of evolution in its lifecycle; hence it is strongly recommended against undertaking large-scale investments into building on-premise infrastructure to hose Gen AI LLMs.
  • Legal and Ethical Exposures – Enterprises considering training LLMs with actual customers' historical data, such as communication, interaction records, etc., could include PII datasets that may result in exposing customer data. Business leaders can overcome this risk by working with CISOs to develop data categorization processes and access policies and implement data cleansing steps in training the LLM.
  • Change to Cybersecurity professional's job content – Enterprises that leverage internal threat-hunting teams using traditional tools to identify potential cyber threats now have to upskill the team's capability to track and identify attacks orchestrated using AI. CISOs can adapt by partnering with cybersecurity technology partners and security research agencies while focusing on enhancing in-house capabilities.
  • Enterprise SaaS, data residency, and third-party processing - Currently, most LLMs are hosted and are accessible only with major cloud service providers. Enterprises pursuing their Gen AI ambition must share the data with the cloud computing platforms creating data residency issues that may create legal exposure due to data privacy laws since cloud hosting is a multi-tenant model used by several organizations that share the same cloud farm to host their information, they represent an HVTs for threat actors creating contraction risks for the enterprises.

It is crucial to establish a strong framework for assessing AI-based systems as they integrate into society and enterprise software products. As CISOs, we must consider the entire AI-powered ecosystem for a given system, not just the LLM. This is because AI is a complex integration of many components, and we must comprehend where AI systems intersect with our standard business systems at the Agent, Integration, and Tool layers. Failure to do so could lead to more sophisticated cyber threats, which must be grappled with. Therefore, let's be proactive in assessing and defending against these threats, ensuring that we maintain a safe and secure digital environment.


The author can be reached at [email protected].

要查看或添加评论,请登录

Vishnu Murali的更多文章

社区洞察

其他会员也浏览了