Azure Data Explorer: Real-Time Analytics - Palo Alto Web Traffic Logs
Since remote working has become the norm, risk and information security teams are operating in a completely different landscape and must adapt in order to meet these new monitoring and log requirements which is critical to ensure resiliency and security of business operations.
Over the past couple of years, our client's Palo Alto Web security log storage costs on Hadoop have increased 5 fold and their ad-hoc Kafka queries are running at a snails pace preventing the information security team from responding to threats in a timely manner.
Azure Data Explorer (ADX) is a fully managed data analytics service for real-time analysis on huge volumes of data streaming from applications, websites and IoT devices. Performing near real time analytics over over petabytes of data, returning results in less than a second across billions of records.
After our client switched from Hadoop to ADX as the primary storage they observed a huge cost savings of 50% and reduced Kafka Cluster query responses from 30 minutes to a few seconds!
What Data Mastery love most about Azure Data Explorer is the simplified solution and ability it provides our client to add new data ingestion pipelines!
For a detailed explanation on Azure Data Explorer click here.
Ingesting Palo Alto Logs from Azure Storage to ADX
In this article, I will demonstrate how to create an Ingestion Pipeline to ingest and transform Palo Alto Web Traffic logs files uploaded hourly to an Azure Storage Account and which accumulate to a daily total of 200GB (when uncompressed).
The file is a compressed .gz file split into three different formats:
Solution
The solution used follows the high-level steps below:
P.S. This Solution is as simple as it sounds!
Ingestion Pipeline
Prerequisites
To create the ingestion pipeline the following steps must be completed
Steps
1. Create a container on Azure Storage - ADLS Gen2.
2. Create an ADX staging table with one column of data type string.
3. Set a Retention Policy on the ADX Staging table to only keep 14 days of data.
领英推荐
4. Create an ADX Function.
The function reads and transforms the data from the staging table to the desired output. Only a subset of source columns are required in the output.
5. Create an ADX Destination Table for the curated data.
The ingestion function can be used to create the schema for the destination table using the following script:
NOTE: Ensure the DateTime and numeric columns are typed correctly as ADX stores metadata and statistics for each column. ADX will also store the maximum and minimum values of the extent of the data. This will ensure that when the user requests the data from the store, with certain conditions, it will be compared and only relevant extents are scanned and returned as results.
6. Create ADX Update Policy
The Update Policy instructs ADX to automatically append data to the target table whenever new data is inserted into the staging table, based on the transformation function created above.
7. Create an Event Grid Ingestion Method.
The chosen ingestion method is ingesting data into data explorer via Event Grid from ADLS.
NOTE: Txt files do not have mappings. Mappings are only used for CSV, JSON, AVRO, and W3CLOGFILE files.
8. Test :)
Upload a file to the Azure storage container. If the ingestion has failed run the query below to check why.
Conclusion
If you would like a copy of my code, please drop me a message on?LinkedIn.
I hope you have found this helpful and will save your company time and money with Azure Data Explorer.
Please share your thoughts, questions, corrections and suggestions. All feedback and comments are very welcome.
Helping organisations realise true AI Transformation | Azure OpenAI | Multi Agents Framework | LLMs | AI Security & Governance | ML | Multi-Cloud Data & AI Strategist
2 年Amazing results and thanks for sharing the successful story Rory McManus ??
Wonderful! Way to go Rory McManus!