Plaintext: Language AI Models
Welcome to Dark Reading?in?Plaintext. In this issue, we look at machine language algorithms for natural language. What advances have we seen in language models and how they are used in security? Can we get to the point where security defenders can rely on natural language queries to craft requests to security information and event management systems? We are getting close, researchers say.
Security Applications of Language AI Models
Generative Pre-trained Transformer—GPT-3—is a generative neural network that uses deep-learning algorithms' ability to recognize patterns to feed back results into a second neural network that creates content. GPT-3's creator OpenAI is already working with GitHub to create Copilot, an automated pair programming system that can?generate code from natural-language comments and simple function names. Researchers from Sophos say GPT-3 can turn natural-language queries, such as "show me all word processing software that is making outgoing connections to servers in South Asia," into requests to a security information and event management (SIEM) system. GPT-3 is also very good at taking a small number of examples of website classifications and then using those to categorize other sites, finding commonalities between criminal sites or between exploit forums. GPT-3's effectiveness means it can eventually help ease the job of cybersecurity analysts and malware researchers. [Read more Large Language AI Models Have Real Security Benefits]
Natural language processing (NLP) focuses on machines being able to take in language as input and transform it into a standard structure in order to derive information. Natural language understanding (NLU) refers to interpreting the language and identifying context, intent, and sentiment being expressed. For example, NLP will take the sentence, “Please crack the windows, the car is getting hot,” as a request to literally crack the windows, while NLU will infer the request is actually about opening the window.
Security teams can use deep-learning models to identify sensitive or harmful content. An AI transformer model uses its understanding of natural language to analyze email data it has never been exposed to, such as credit offers, lottery ticket promotions, employment offers, or COVID test results, in order to classify and identify malicious content. NLU can be used for scanning enterprise email to detect and filter out spam and other malicious content, as each message contains all of the context needed to infer malicious intent. The metadata of the message, such as the IP address and domain it was sent from and who it was sent to, combined with the content in the body and attachments, provide signals to assess whether the message is good or potentially malicious. [Read more Enhancing DLP With Natural Language Understanding for Better Email Security]
Machine-Learning Models are Valuable. Training costs for sophisticated ML models can run from the tens of thousands of dollars to millions of dollars. One model, known as XLNet, is estimated to?cost $250,000 to train, while an analysis of OpenAI's GPT-3 model?estimates it cost $4.6 million to train. The price tag considers the value of the intellectual property of models, the cost to label all the training samples, and the raw computing power needed to run the models. This is why companies have to think about protecting their machine-learning models. One way is watermarking, or training the model to produce a specially crafted output if the neural network is given a particular trigger as an input. [Read more Companies Borrow Attack Technique to Watermark Machine Learning Models]
"There is tremendous value locked into today's machine-learning models, and as companies expose ML models via APIs, these threats are not hypothetical." -Mikel Rodriguez, MITRE
领英推荐
Headlines on Tap?
Enjoy reading Dark Reading? Subscribe to receive?Dark Reading Daily?every morning!
On That Note
Dark Reading's 2022 Strategic Security Survey looks at ways enterprise security teams are managing threats and risk. Data from last year's survey?found that top executives as paying more attention and placing a higher priority on cybersecurity because of the increased media attention around incidents.?The heightened attention comes with a cost, as 46% said there was more pressure on the security organization and team members were experiencing significantly elevated stress levels. On the other hand, 38% of respondents said end users were more aware of cyber threats and were being more careful in their behavior.
What issues are cybersecurity professionals more concerned about in 2022? That's what Dark Reading would like to find out in?this year's Strategic Security Survey?If you haven't filled it out, please considering doing so (and enter the drawing for the gift card).