登录查看更多内容

CyberSecurity Feed Summarisation with Context using AI

Venkatesh S.

Making AI Accessible for Digital Safety

发布日期: 2023年9月7日

One of the challenges faced by security professionals is the need to be abreast of current security trends. However, the constant flood of digital content throughout the day is an important hurdle in achieving this.

The President's Daily Brief, sometimes referred to as the President's Daily Briefing or the President's Daily Bulletin, is a top-secret document produced and given each morning to the president of the United States; it is also distributed to a small number of top-level US officials who are approved by the president. It includes highly classified intelligence analysis, information about covert operations, and reports from the most sensitive US sources or those shared by allied intelligence agencies.

Motivation - Imagine the incredible boost you'll get from having your very own personalised PDB!

This article aims to present an overview of one of the AI model I developed for summarising cyber security news, which is hosted in Huggingface, with just over 400 million parameters producing state-of-art performance with ROUGE-1 score over 49 staying ahead of today's top models, here is list curated with state-of-art performing models and their metric scores.

https://medium.com/besedo-engineering/text-summarization-part-2-state-of-the-art-ae900e2ac55f

Because I work in the CyberSecurity domain and assume the audience is more inclined, I would keep it really high-level in terms of how I built.

I used several news emails sent out by service providers such as Mandiant, RiskIQ, Microsoft, and others to extract summaries and full website links to create a dataset containing just two fields: summary and full news from web page added some scraped information from selected cybersecurity websites.

We don't require an LLM to complete the operation, and it should be able to operate on a personal computer with modest resources. I chose BARD since it has been shown to perform well in similar tasks. Dataset over 6K and trained on RTX 4090 for several hours (~14).

Results are much better than what I expected and proven to have better than state-of-art models.

https://huggingface.co/venkycs/securityShots

How to use ?

You can scrape a website and send body html preprocessed with little cleaning; the model will intelligently remove tags.

Here is the preprocessing script https://github.com/venkycs/urlShots/blob/main/urlShots.py

领英推荐

Why AI needs a red team

Google Cloud 1 年前

Artificial Intelligence (AI) & Machine Learning (ML)…

Pratibha Kumari J. 7 个月前

The Convergence of AI and Cybersecurity

Realtime Recruitment 1 个月前

The model uses not more than 2GB of ram and can run in CPU, with inference time 2-3 seconds. You can use new aggregation services and preprocess content to generate your own version of PDB. Please contact me if you require a dataset that I cannot publicly share because it contains scraped content.

Towards more complex applications in CyberSec

Some of the use cases for GenAI listed below:

Generate recommendations for particular APT Group
Identify Possible APT group with behaviour X
Anonymise logs
Detect PII or PHI information in the logs
Extract IP addresses, usernames or context from the logs.
Building langChains for implementing SOAR functionality, literally SOAR can be replaced for good.
Communication for Security tickets automation based on incident.
Vulnerability detection in Source code.
Malicious packages detection in repos like PyPi, NPM etc..
Explain CVE, CVE, Mitre etc..
Generate policy documents for example - AUP or InfoSec aligned with standards X, Y, Z
Interactive SecOps Bot, who is in shift with me, SLA violations, what are the mishandles user X do often ?
Threat hunting plan generation and automation using LangChain and fine-tuned LLM.
Design and implementation steps generation for new Security device using LLM.
Log explanation or analysis with detailed description.
Document Classification
Many more in threat hunting ...

To get started towards building a LLM for Security pros, I already used Stanford Alpaca techniques to create dataset and generated few tasks specific listed above. Overall Zero-shot performance on few cases like IP extraction, log anonymization, Mitre explanation etc.. were achieved.

Model - https://huggingface.co/venkycs/llama-v2-7b-32kC-Security

Dataset - https://huggingface.co/datasets/venkycs/llm4security

Model created was tuned from LLAMA 7B having 32K context length and I have few reasons to choose the complex model, as my idea is to have semantic search in threat based applications, uses PEFT to optimise and needs adapters to be loaded. The code is complicated, so if you have a background in AI and want to know more, feel free to get in touch with me. This might need more training with new information to make it work better. That might be a different topic, depending on how much time I have.

Conclusion

Researchers are publishing a lot of interesting solutions for legacy concerns in the field of new AI models, particularly in the field of NLP and NLU. However, in my opinion, LLMs are similar to knowledge bases that require specific domain experience to apply correctly in order to obtain or meet expectations. In other words, knowledge necessitates certain expertise in order to be transferred to a domain-specific skill set. It is important for domain experts to understand, how AI can be applied in their own specific domains. We are only limited with domain specific implementation ideas and right dataset, things are not too far as they seems.

Running such service often requires certain efforts and could be tedious task, focusing on same issue to produce news and summaries here is the mobile app (AttackIO) created for cybersecurity pros, incase if you are one then you should try it.

Adam Chen Longhui

Quant Trading Enthusiast, MSc in Quant Finance

1 年

Hi Venkatesh, thank you for sharing the wonderful article. May I know what are some ways for individuals to get enough dataset to train a text-summarization model to a real-world deployable level?

Rakib Hossain

Graphic Designer at Fiverr

1 年

Are you looking for flyer, brochure, one pager, business card, t- shirt design for your company then order me now without any delay and get your?desired design made in a very short time. Please contact with?me: cutt.ly/awlpr7gX

Harshil Shah

1 年

Brilliant stuff, Venky! P.S - Jaw drop moment at ROGUE1 score of 49! ??

1 次回应

Shiba M.

Threat Hunting | AI & ML Cyber Security Investigator | OSINT Adversary Hunting

1 年

Lovely ?? Venky

1 次回应

查看更多评论

要查看或添加评论，请登录

Venkatesh S.的更多文章

Untangle AI Model's Security Assessments

2023年7月10日

Untangle AI Model's Security Assessments

Artificial intelligence (AI) is a rapidly growing field with the potential to revolutionize many aspects of our lives…

5 条评论
ActiveDefense - Hack the Hacker

2020年8月10日

ActiveDefense - Hack the Hacker

It is always interesting to learn about system design and hacking it. And before we move on, my understanding of…
Real Vulnerability - Threat Hunter's formula

2018年10月5日

Real Vulnerability - Threat Hunter's formula

The Virus days are gone, even malware authors has no time to waste these days. Now, it's either about wild attacks for…

4 条评论
Unreported WhatsApp Bug

2017年12月9日

Unreported WhatsApp Bug

Since the starting of mobile-era, I'm very much clear that there is no such word called privacy. Due to which - I'm…

8 条评论
Vulnerable SMB Protocol - Beyond WannaCry

2017年5月22日

Vulnerable SMB Protocol - Beyond WannaCry

By now everyone in Security domain should've gained enough insights of WannaCry Ransomeware. In this post lets talk…

3 条评论
Thick Client Security Assessment - I

2017年1月31日

Thick Client Security Assessment - I

Now-a-days we see lot of Security Professionals come from application security background and having no idea about…

14 条评论
BlackNurse Attacks - Analysis & Detection

2016年11月15日

BlackNurse Attacks - Analysis & Detection

While spending boring jobless days sitting at home all of sudden I came across "BlackNurse Attack" - Single computer…

7 条评论

See all articles

CyberSecurity Feed Summarisation with Context using AI

Venkatesh S.

Making AI Accessible for Digital Safety

Motivation - Imagine the incredible boost you'll get from having your very own personalised PDB!

How to use ?

领英推荐

Towards more complex applications in CyberSec

Conclusion

Venkatesh S.的更多文章

社区洞察

其他会员也浏览了

Using Artificial Intelligence For Cyber Defense

Will AI rescue the world from the impending doom of cyber-attacks or be the cause

Title: Day 3 "Understanding the Dynamics of Misinformation in the Digital Era: A Comprehensive Analysis"

Machine Learning Security

Machine Learning Enhancing Cyber Security?in Conversational AI’s

Artificial Intelligence and Cybersecurity

AI Security Nightmares Are Already Happening—Are You Ready?

The Role of Artificial Intelligence in Cybersecurity

Riding the AI Wave: Balancing Opportunities and Threats

Striking a Balance: The Role of AI and ML in Shaping Tomorrow's Cybersecurity Landscape

Motivation - Imagine the incredible boost you'll get from having your very own personalised PDB!

How to use ?

领英推荐

Towards more complex applications in CyberSec

Conclusion

Venkatesh S.的更多文章

Untangle AI Model's Security Assessments

ActiveDefense - Hack the Hacker

Real Vulnerability - Threat Hunter's formula

Unreported WhatsApp Bug

Vulnerable SMB Protocol - Beyond WannaCry

Thick Client Security Assessment - I

BlackNurse Attacks - Analysis & Detection

社区洞察

其他会员也浏览了

Using Artificial Intelligence For Cyber Defense

Will AI rescue the world from the impending doom of cyber-attacks or be the cause

Title: Day 3 "Understanding the Dynamics of Misinformation in the Digital Era: A Comprehensive Analysis"

Machine Learning Security

Machine Learning Enhancing Cyber Security?in Conversational AI’s

Artificial Intelligence and Cybersecurity

AI Security Nightmares Are Already Happening—Are You Ready?

The Role of Artificial Intelligence in Cybersecurity

Riding the AI Wave: Balancing Opportunities and Threats

Striking a Balance: The Role of AI and ML in Shaping Tomorrow's Cybersecurity Landscape