Security Papers Review #5 Automating Cyber IR with Reinforcement Learning
Photo by DeepMind on Unsplash

Security Papers Review #5 Automating Cyber IR with Reinforcement Learning

Well, the full name of this article was too long for a headline: Bridging Automated to Autonomous Cyber Defense: Foundational Analysis of Tabular Q-Learning by Andy Applebaum and a team of Apple folks. It was presented at the same AISec`22 workshop as the article covered in my previous post.

Reinforcement learning (RL) looks like the next big thing in AI, mostly due to its successes in games (Go, chess, and Atari). However, applying RL to real-life problems has been a challenge. Here's my simplistic explanation of why. RL can learn the best strategy, which sounds like what we need in many situations in life. However, RL works well when there are many possible situations (states) but only a small number of possible actions in each state. Lastly, you need to be able to run huge numbers of scenarios to train the network. This fits nicely with anything you can reliably simulate but is much harder to find in the real world.

RL is a complex topic, and I'm not an expert in RL. But I want to learn when RL can be applied to the things I encounter daily, and this article helped me with that. Two things I liked about the paper: 1) I learned about Microsoft's CyberBattleSim - a simulated network environment where attackers and defenders can be automated to train RL agents, and 2) - its use of a simple RL technique, tabular Q-learning, which made the paper useful and accessible to non-experts like myself.

The paper approaches an important practical problem - finding an optimal incident response (IR) strategy. Think of analysts that need to make quick decisions under a fast stream of events in the monitored network and under the pressure of screaming unhappy customers whose machines became unavailable. The paper evaluates tabular Q-learning as a possible tool to automate the analysts' roles.

No alt text provided for this image
Slide by @ Andy Applebaum


In very short, with RL we talk about an agent that can take some action in each state (e.g., isolate the compromised machine). An action will drive the system into a new state and yield some reward (the system becomes safer). Q-learning tries to maximize the reward while going over states and possible actions. To choose the next action, it combines past experience (taking the action that brought the highest reward in the past) and sometimes trying something new. This is actually a good strategy in life as well :-).

No alt text provided for this image
Slide by @Andy Applebaum


In the simulated network, a game occurs between attackers and defenders. The attackers can exploit vulnerabilities and move across the network. The defenders can choose from reimaging a machine (making it unavailable), resetting the entire network, revoking the user's credentials, or just waiting. As in real life, the observations of both sides contain a mix of true events with some portion of noise and errors. The agent is rewarded if the defender can keep the network safe and available.

Of course, there is a lot more in the paper. The authors improved upon the simulator, system state representation, and the loss function (making the agent loss-averse).

No alt text provided for this image

So, did Q-learning work well? Yes and no. Yes, because, on average, its strategy overperformed the simple baseline approaches. No, because in many specific cases, the baseline ones were still the top performers. As the paper puts it: "while the learners might offer higher average rewards, no one learner always outperforms the baselines." At the end of the day, much more research is needed to find the optimal learner, but this paper helps us by providing an environment and a benchmark to start with. I liked it and hope you will too.

See my previous article here.

P.S. I may make a short break in posting due to a family celebration.

If you enjoyed this article,?please repost it?- this will help me to know it was useful.

要查看或添加评论,请登录

Moshe Kravchik的更多文章

社区洞察

其他会员也浏览了