OpenAI Friend or Foe: Malicious email detection with generative AI

Jorge Hurtado

发布日期: 2023年3月30日

I don't know about you, but since all this new wave of generative AI went live last year, it seems that the only thing I seem to hear is about the dangers this technology poses for all humanity, since the different threat actors will use these technologies to achieve their nefarious objectives.

While undoubtedly true (btw it already happened in the past with big data and cloud technologies), I think we are missing the point of the largest technological revolution that we will possibly live (with permission of the Internet itself).

Where I think we should focus all of our efforts now is how to immediately apply these new technologies to improve security operations across the enterprise ecosystem.

So here is a small proof of concept in order to see if we could ease the process of analyzing reported phishing emails. In very large companies this can be a concern, particularly if we have to deal with language barriers and multicultural environments.

The aim was not only to try to filter out and detect phishing with technical indicators (this you are probably already doing it) but also analyze the semantic context of the email.

So in order to see the results without using sensitive emails, I used two sets of mailboxes, one with a large dataset of phishing emails, and the other from a set of publicly available benign corporate messages from Enron.

The program decomposed all this messages and basically use this prompt (yes, this is today's secret sauce) in order to get the verdict for each one of them.

import openai

openai.api_key = getsecret("openaisec")

def checkemail(email,subject,body)
    prompt = f"{email}\n{subject}\n{body}"
    completions=openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            temperature=0.2,
            messages=[
                {"role": "system", "content": """
                        You are an assistant that will help me to evaluate the likelihood of an email being a phishing campaign, including CEO Fraud. I will give you in the first 
line the email it’s coming from, in the second line the object and from the third line I will put the body of the message to be analysed (ignore the initial warning in the emails) 
The following situations are useful in order to to identify a malicious message:
                        - Mails that have misspelling
                        - Mails asking users to change the password
                        - Mails attaching invoices that need to be paid immediately
                        - Mails asking to pay outstanding balances
                        - Mails that invite the user to do a transaction involving money or redeemable codes like skype credit, amazon credit, etc
                        - Mails that suggest that you can win a large amount of money with little effort
                        - Mails of providers, asking for a change in their account, where from now on all the invoices should be paid to
                        - Mails coming from open providers like gmail.com, hotmail.com, protonmail.com are more likely to be malicious
                        - Mails coming from known companies are less likely to be malicious
                        - Shortened URLs links are a common signs for identifying phishing
                        - It’s less likely to find phishings in internal emails.
                        - Mails urging to pay an invoice and including the new account details in the email should always be considered to be malicious
                        - Usually phishing emails have a link to another page where they will try to steal the credential
                        - Malicious content can also be hidden in repositories, like mega, sharepoint etc. Usually threat actors include links to malicious documents in external providers.
                        - Never mention or consider the initial warning about mails coming from external sources.
                                       
                        With all this information, I want you to score from 0 to 100 the likelihood of the email to be a phishing or a CEO Fraud. I only want the score and an explanation no longer than two lines.
                """
                },
                {"role": "user", "content": prompt}
            ]
    )
    return completions["choices"][0]["message"]["content"]

These were the results so far after analyzing a very large data set of more than 5000 mail messages:

Corporate Messages (Enron): FP (False Positive) Ratio: 0%
Phishing Messages (Malicious): FN Ratio 7%
CEO Fraud Messages: FN (False Negative) Ratio: 0%

In absolutely all the cases the analysis were very consistent with the content of the email, with no evident mistakes.

Some examples (in the footer of the email you can see the "unplugged" response from the function):

KnowBe4 8 个月前

[Heads Up] Cybersecurity Expert: AI Lends Phishing…

KnowBe4 10 个月前

How Threat Actors are using ChatGPT, AI-generated…

CloudSEK 1 年前

No alt text provided for this image — Likelihood of phishing: 95. The email contains a sense of urgency and asks the user to click on a suspicious link to confirm their identity due to technical updates. The link is not from the official KeyBank website and the email also contains grammatical errors.

Impressive! Not only scores the email with a 95, also detects that the link does not correspond to the bank's domain, and notices the gramatical errors.

This one is also tricky but the AI does it perfectly. It mentions a bonus, and is sending a password protected file. Even if it gives the user a low score, it makes a comment about being weary of the file.

Finally it did great too with the CEO Fraud emails that we tried, for instance:

Needless to say, for this particular analysis the implications are huge. Even in the cases where it was incorrectly considered to be safe, after manual analysis it's obvious that no human could have guessed that the email was malicious only by analyzing the content of the email. There is not difference if the input email is in Portuguese, Italian, English or Spanish, the content will always come out right, even without training a custom model, which could have improved the results, particularly on the output format which was a little bit tricky to handle.

Potentially you can save thousands of hours of analysts, reviewing the content of possible phishing and CEO Fraud emails and even provide to the users a real-time assistant in order to give some advice users on any email.

In the meantime, also some drawbacks that I still see (and please comment me if you think I am wrong):

I still don't fully understand (not sure anybody does) OpenAI's privacy policy and dataflow, so I would still be weary on using it with real emails, even if Microsoft launched its new Azure OpenAI service just days ago.
The response time for the API is still terrible ... More than 40 seconds for some of the queries, so not yet ready for real time queries.
Due to the presence of FN, probably it's best to use it as an enrichment for the analyst, rather than a complete susbtitution, but this should make their life much more easier.

So what do you think? Will humanity and cybersecurity be destroyed by generative AI or can we all benefit from it ? Have you found more applications that you want to share for the benefit of the community?

PS. By the way, if you are interested in doing your own PoC, let me know and I will share some datasets and tips for you.

Javier Fernández-Sanguino

1 年

Very interesting exercise Jorge. Thanks for sharing! This is definitely one of the use cases that the (defensive) cybersecurity community can apply IA to and reap its benefits. I guess it is only a matter of time that this is applied by email security providers and offered to end customers, although the costs of IA appear to be (currently) quite high. My guess is that using LML IA for these type of security analysis will most certainly come as a "premium service".

1 次回应

要查看或添加评论，请登录

查看全部

OpenAI Friend or Foe: Malicious email detection with generative AI

Jorge Hurtado

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

AI Does Not Scare Me. What Scares Me About AI!

The Dark Side of AI: How it Can be Used for Malicious Purposes

FraudGPT: Criminals Have Created Their Own ChatGPT Clones

Announcing the beta for Red Sift Radar: An LLM Assistant for Security Teams

Social Engineering Evolution: The Rise of Deepfake Phishing

The Generative AI Threat Landscape

AI in Cybersecurity: Detailed Analysis and Mitigation Strategies

How is ChatGPT Revolutionizing Cybersecurity? Explore the Pros, Cons, and the Game of Attackers vs. Defenders!

The growing use of AI in BEC scams

Unleashing AI Against AI: SlashNext - The Leading Cloud Email Security Solution

领英推荐

Beyond the Red and The Blue: Meet the "Rainbow Team"

2019年6月18日

The Age of Threat Hunting

2018年4月16日

Onoda's syndrome (or the case for better Cybersecurity Operations)

2018年1月22日

U really wanna cry? My five key takeaways from the "wannacry" incident

2017年5月18日

Call to arms: We need to kill IoT (before IoT kills us)

2016年10月24日

Crying over the dead cat’s corpse

2016年1月29日

Reckless product design or why we might need a Critical Technology Protection Act

2015年12月2日

The death of the dodo (or how to manage talent in the Cyber Bubble)

2015年11月19日

Houston, we've been breached !

2015年11月2日

The right to bear malware

2015年10月22日