Data to the rescue. Cybersecurity through a systems lens.

Data to the rescue. Cybersecurity through a systems lens.

At Palo Alto Networks we're building one of the largest data platforms in the world capable of working with Exabytes of data for machine learning and AI. However, why would a cybersecurity company known for making the best firewalls on the planet (among other things) need to do so? The answer lies in the rapid evolution of digital cyber threats. The rate of attacker innovations continues to accelerate, with cyber attacks emerging as the biggest risk to our way of life in the digital age. As our society’s dependence on highly automated, connected digital systems increases, so does the impact of threats against it. In this blog post, I’ll outline why I believe that successful mitigation of cybersecurity threats critically depends on large scale data platforms and machine learning. I also outline why I believe that Palo Alto Networks is uniquely positioned to be successful in this effort. In a nutshell, we have access to the best data and have a substantial lead in making this transformation a core part of our future. I firmly believe that in the future, successful security of systems that we critically depend on will be provided by platforms like these. 

At Palo Alto Networks we are approaching security in an entirely new way. Rather than having humans teach machines how to recognize threats, we are relying on gathering rich security data real-time and having software learn to recognize threats automatically from the data collected using machine learning and AI. What this means of course, is that we are focused on detection as much as remediation. However, this approach presents a major challenge: to get machines to learn effectively requires a ton of data, both in breadth and quantity. Software cannot learn from what it has not seen, and machine learning algorithms are notoriously data hungry. A now classic paper from Google (aptly titled ‘The Unreasonable Effectiveness of Data’) shows that simple algorithms trained on larger, better data sets outperform sophisticated algorithms trained on limited data sets. The paper is in the context of NLP, but the same principle holds in many domains. This, in turn, implies that the data needs to be gathered across the breadth of an enterprise to get the required visibility for detection. The Palo Alto Networks platform provide us with complete visibility into highly useful data across the cloud, network, and endpoints - the first of its kind. 

It's not enough to just collect and store massive amounts of data. We also need the means to efficiently process the data. This is the core of our Application Framework (1). An ultra-large-scale distributed system for storing and processing gigantic amounts of data. Based on this data, the Application Framework will allow us to quickly build and deliver the next generation of innovative cybersecurity capabilities. We also recognize that innovation in cybersecurity is not limited to Palo Alto Networks (though we are pretty good at it). It’s why the Application Framework is truly open for any security developer to leverage the framework’s data and processing capabilities via APIs to build innovative security offerings. We are serious about our mission to protect our way of life in the digital age, and that commitment translates to sharing data and capabilities to prevent future attacks with anyone.  

A second aspect of the Application Framework enables change in the consumption model. It allows security solutions to be built and delivered as cloud-based services. This flexibility in turn allows customers to think in terms of needed security services and compose a solution tailored to their needs. This is in complete contrast to the traditional approach of purchasing a variety of different appliances and products with overlapping feature sets and combining them into one solution using a combination of automated and manual workflows. 

To be fair, none of these things were apparent to me before I joined Palo Alto Networks. My first response when approached to lead the core data platform was ‘why?’. Having worked on building large scale distributed systems and platforms, at places known for building large, distributed systems all my career, I didn’t see an immediate fit with a cybersecurity company. However, conversations with Nir Zuk and Lee Klarich about the vision of Application Framework helped convince me that this was, in reality, a big systems challenge. What appealed to me most as a systems person was the sheer scale of our end goal. The Application Framework will be working at a scale very few companies can match. At peak, we expect to have tens of exabytes of data and ingest hundreds of millions of events per second. What we build will provide facilities for low latency queries, real-time stream processing, machine learning, and large-scale batch processing over the exabytes of data in the system. These capabilities are exposed as easy to use APIs which hide the complexities of dealing with data and compute at these scales. To achieve this vision, we will need to push past the state-of-the-art in several areas and invent new capabilities in areas like structured query processing, data storage, and stream processing. As part of this, we plan to be active members of the open source community and contribute back to projects or create new ones wherever possible. 

If you’re interested in building ultra-large-scale platforms handling exabytes of data, don’t hesitate to reach out to me on LinkedIn or at aghosh [at] paloaltonetworks.com. We are constantly on the lookout for people who are passionate about platforms, infrastructure, and want to build globally distributed systems. Being a good engineer and programmer is necessary, but beyond that, there are no hard and fast requirements. We already have world class experts with us who can help you learn. 

We need innovators, new thinkers, individuals driven by a mission to protect the world’s digital infrastructure – more simply put, protecting our way of life. Your expertise in areas like large scale data storage, machine learning, indexing and query processing, stream processing, and more, will help us continue to address cyber threats as they evolve 

We strive to provide a work environment which offers challenging problems with the resources and the freedom to deliver innovative solutions. Having done away with a huge management hierarchy, we get things done quickly and efficiently here, without adding unnecessary processes to your workload. A majority of our engineering leaders have a technical background. I myself try to spend my woefully limited free time writing code whenever possible (if my 4 year old gives me a break from playing with her, that is :) 

If you’re looking for your next career, consider accompanying us on our journey into the cloud. Cybersecurity experience isn’t necessary – but a passion and dedication to learning and defining the future of it, is. One piece of advice I always give to people is that they can’t be afraid to think big. At worst you’ll fail, but in the process learn a lot. At best, you’ll change the world. 

 

(1) This video by Nir Zuk our co-founder and CTO provides a deeper look at the genesis of the Application Framework. 

要查看或添加评论,请登录

Arunabha Ghosh的更多文章

社区洞察

其他会员也浏览了