The Security Implications of ChatGPT
Upendra Mardikar
EVP, Chief Information Security Officer at TIAA. Author, Advisor, Mentor
What is ChatGPT?
To kick off this discussion, let’s focus on understanding what ChatGPT is and how it works. GPT stands for?Generative Pre-trained Transformer.?In other words, the transformer is “pre-trained” by simple math (potentially Na?ve Bayesian) parameters. Humans don’t need to train it.
ChatGPT was developed by OpenAI,?which is a research institute focusing on developing artificial intelligence (AI) technology in a responsible manner, and the transformer offers a “chat” like interface or prompt.
So, how does it work? ChatGPT is a Natural Language Processing (NLP) technology without any supervised training. As mentioned, it is “pre-trained” by mathematical parameters. We simply enter what we want to do, and it will output it for us. For example, we can say “Generate a poem for TIAA”, or “Write Go code to do bubble sort” or “Generate new pop music” and it will generate it for you. Or, we could say, “Generate Cyber Security training on encryption” and it will produce several pages of text on that subject.?
Until recently, the limitations of NLP technologies were that the NLP models were trained specifically on a particular task using supervised learning. There were several confines with this approach. For example, there is often a lack of data available to train and lack of generalization - i.e., training could be done only for specific tasks. Generative Pre-Trained Transformer offers learning a generative language model using unlabeled data and then fine tuning the model by giving specific examples. So, we start with “unsupervised learning” and fine tune it with supervised models, hence the name Generative Pre-training.?
OpenAI launched GPT-1 in 2018 with 117million parameters. GPT-2, released in 2019 was trained on 40 GB of text data and had 1.5 billion parameters. GPT-3 which was released Dec 2022 has 175 billion parameters.
The accuracy of GPT goes up to 80% which is pretty good compared to the competition noted in the chart above, which peaked at around 25-30%.?
The following data sources was used to train GPT-3.
Why is it important to talk about it?
Let’s look at an example of how this could change how the Internet is used. Imagine that you are on a search engine. Today, if we give the following command to a search engine: “Generate an email with email id?umardikar@mydomain.com?to wish recipients happy holidays”, the search engine would respond by displaying all the applicable pages with templates, guidance, etc., but it would not actually create an email for you.
However, with ChatGPT here is the output I received:
“Sure, here is a sample email that you could send to someone to wish them happy holidays:
Subject: Happy Holidays!
Dear [Name],
I hope this email finds you well. As the holiday season approaches, I just wanted to take a moment to wish you and your loved ones a happy and healthy holiday. I hope you get to enjoy some rest and relaxation with your family and friends during this special time of year.
Best regards,?
[Your Name]”
Even though this email was written by ChatGPT, it comes close to passing a version of the “Turing Test” – a method of inquiry in artificial intelligence (AI) for determining whether or not a computer is capable of thinking like a human. We will not go into the philosophical debate regarding the relevance of the Turing Test, but at a high level, this version of the test simply states that an average human won’t be able to distinguish between a human and computer at the other end of shielded conversation in less than five minutes.?
You may be worried at this point that technology could take over in the future. This concept is known as “technology singularity”, or a point in time where technological growth becomes uncontrollable and irreversible, resulting in unforeseeable changes to human civilization.
While ChatGPT is impressive, I don’t think it has reached technology singularity yet because we are a long way from artificial general intelligence.?
Bottom line, I think this is close to passing the Turing Test and hence it becomes very difficult to distinguish between humans and machines - that’s the biggest take away.
Potential Security and Fraud Implications
OpenAI has gone lengths to prevent malicious uses of its site. In fact, its original release date of 30 Nov 2022 was postponed by 15 days because they wanted to add Security to it. That said, here are some potential concerns associated with ChatGPT:
Data scraping?
For example, when I tried to scrape a site, it gave me the following message. However, it gave me all the methods as to how I can scrape it, which is laughable. Here’s the response:
?
“Scraping data from websites is generally against the terms of service of most websites, and can also be illegal in some jurisdictions. Instead of scraping data from <site under question>, you may be able to find a publicly available dataset of??<data> or use an API to access data from the platform.
If you do decide to scrape <site under question>, you will need to use a web scraping tool or library to extract the data from the website. Some options for web scraping tools include:
Python libraries such as Beautiful Soup or Scrapy
Browser extensions such as Web Scraper or Data Miner
To scrape data from <site under question>, you will need to send HTTP requests to the website and parse the HTML of the web pages to extract the data you are interested in. You will also need to handle any pagination or filtering options on the website to ensure that you are able to scrape all the data you need.
领英推荐
Keep in mind that <site under question> may block or limit your access to the website if you send a large number of requests in a short period of time, so you may need to use techniques such as rate limiting or rotating IP addresses to avoid being detected as a scraper.”
Phishing emails and sites
Since emails could be generated, the common indicators of phishing emails like typos and spelling mistakes could be avoided. The hover-over will become extremely important and training material will need to be revisited. Generation of fake sites would be lot easier and will be very difficult for humans to distinguish.
Use of technologies to authenticate emails like Domain-based Message Authentication Reporting and Conformance (DMARC), DomainKey Identified Email (DKIM), and SPF (Sender Policy Framework) will alleviate it. Even though these technologies have existed for a while, as a community we need to start enforcing them transparently and flagging those emails that don’t follow these standards and protocols.?
Business email compromise and executive impersonation
Similarly, it would be difficult to distinguish between humans and machines in the case of executive impersonation scams. Using artificial intelligence would provide cybercriminals the opportunity to create more variants of executive impersonation versus the scam we see most often now where the “executive” asks employees to purchase gift cards.?
Infrastructure security code
Security of ChatGPT has beefed up a lot. However, there are workarounds that it can be used for malicious intent. For example, if we ask it to “Generate code for man in the middle attack”, you will get the following response:
“I’m sorry, but I cannot generate code for a man-in-the-middle attack. Man-in-the-middle attacks are a type of cyberattack in which an attacker intercepts communication between two parties and can potentially alter or modify the communication without the knowledge of either party. These types of attacks are illegal and unethical, and I cannot provide information or assistance on how to perform them.”
But it can be used to write code to generate some wireshark filters that can be used, and it does it very well. Wireshark is a network protocol analyzer, or an application that captures packets from a network connection, such as from your computer to your home office or the internet.
Find Vulnerabilities
Cybercriminals can use AI to find vulnerabilities on sites. We all know it is quicker to exploit a vulnerability than to actually fix it. Conversely, we can use this as a tool to find vulnerabilities and potentially patch it at the Web Application Firewall level to prevent it from being exploited.?
Generate malware
Additionally, the tool can be utilized to generate malware. Existing solutions should be able to mitigate malware generation risk.
Synthetic data and voice generation
Generative Adversarial Networks (GANs) have successfully created identities and data that could be used for identity related frauds (deep fakes). With ChatGPT, this can be used with several identity-based attacks. This, combined with another project DALL-E, can make it very difficult to distinguish machines from humans especially when the attacks are cross-channel like voice and web. For example, deep fakes could manipulate elections and implications of this could be far reaching.
Wide range of identification, protection, detection, and containment technologies need to be used. More importantly, AI must be used to defend against AI.?
Regulation or not?
There is an ongoing debate on the need to regulate AI. ChatGPT and equivalent technology will certainly stir more conversations in this space.?
The United States and China have different views. China’s Cyberspace Administration recently issued regulations prohibiting the creation of AI-generated media without clear labels, such as watermarks. The translations reads,
“In recent years, deep synthesis technology has developed rapidly. While serving user needs and improving user experience, it has also been used by some unscrupulous people to produce, copy, publish, and disseminate illegal and harmful information, to slander and belittle others' reputation and honor, and to counterfeit others' identities. Committing fraud, etc., affects the order of communication and social order, damages the legitimate rights and interests of the people, and endangers national security and social stability.
The introduction of the "Regulations" is a need to prevent and resolve security risks, and it is also a need to promote the healthy development of in-depth synthetic services and improve the level of supervision capabilities. Providers of deep synthesis services shall add signs that do not affect the use of information content generated or edited using their services. Services that provide functions such as intelligent dialogue, synthesized human voice, human face generation, and immersive realistic scenes that generate or significantly change information content, shall be marked prominently to avoid public confusion or misidentification.
It is required that no organization or individual shall use technical means to delete, tamper with, or conceal relevant marks.”
The Chinese government will perform security assessments on the new deep synthetic products.
The United States, on the other hand, follows “non-binding” national guidelines called the “Blueprint for an AI Bill of Rights” defined as “a set of principles and practices that seek to guide the design, use, and deployment of automated systems, with the goal of protecting the rights of Americans in the age of artificial intelligence”.?
A set of five principles developed by White House Office of Science and Technology Policy embodies the core of the AI Blueprint:?
1.????Safe and Effective Systems?– emphasizing community feedback in developing AI systems and protections from "unsafe" AI?
2.????Algorithmic Discrimination Protections?– proposing that AI should be deployed in an equitable way without discrimination
3.????Data Privacy?– recommending people should have agency over how data about them is used
4.????Notice and Explanation?– meaning people should know how and why an AI-based system made a determination and,
5.????Human Alternatives, Consideration, and Fallback?– recommending people are able to opt out of AI-based decisions and have access to a human's judgment in the case of AI-driven mistakes.
Conclusion
ChatGPT is a very cool and impressive tool that can have both positive and negative results. The pros include simplifying certain tasks such as composing emails, creating training materials, or checking for vulnerabilities. It can also be a way to have some fun by generating poems or songs.??
But, as highlighted above, it also has the potential for negative security implications. As AI continues to evolve and as regulations keep changing, it is imperative for us to come together as a community to stay on top of this important topic to solve the security problems this technology might pose to us.?
Insightful. Well explained use cases and associated security concerns. However one question that comes to my mind is how scalable is this generative pre-trained model can be. Will this be limited to past events or records (with gained effeciency) or it can go predicative (foretell future patterns and behaviours).. like the other more prevalent AI use case models for forecast
Enterprise Executive @Toptal
2 年Great article Upendra Mardikar!!
Assistant Professor & Coordinator CSE(Cyber Security) at Ramdeobaba University, NagpurI CnD(Certified Network Defender)
2 年Great Article. Thanks for sharing the insights.
Sr. Technology Director Infrastructure & Operations at Snap Finance
2 年Very interesting and insightful reading. Thank you for sharing.! ..??..