Announcing Scanner for Splunk: Lightning-fast threat hunting through your S3 logs directly from Splunk
Scanner.dev
Scanner makes data lakes fast and easy to use. Schemaless log search indexing, all in the user’s S3 buckets.
We're excited to announce the release of our custom Splunk app, Scanner for Splunk, which makes it easy for users to leverage logs in S3 for advanced threat hunting and detection - all while staying entirely within the Splunk UI.?
This app helps teams expand their visibility into historical logs and high volume log sources that are only stored in S3 and not indexed by Splunk.
One interesting cost-reduction approach we've seen our users take is to move high-volume logs out of expensive Splunk ingestion and instead store them directly in S3. Then, they use Scanner to index the log files in-place in S3, and they use our custom Splunk app to perform fast threat hunting and detection on those logs directly from the Splunk UI. This can help reduce costs for these log sources by 80-90%.
Reduce blind spots, lightning-fast threat hunting
In our experience, a typical security team will keep somewhere between 2-5x as much log volume in S3 as in Splunk. For these teams, the Scanner app can increase detection coverage by 2-5x, sometimes dramatically reducing blindspots in high volume log sources, like AWS CloudTrail, Cloudflare, VPC flow logs, and others.
Retention in S3 with Scanner can be much longer than the typical 30-90 days in Splunk due to the low cost of object storage. It is common for our users to search through one year or more of logs in their S3 archives, helping them perform more exhaustive investigations and surface potential advanced persistent threats.?
By leveraging serverless compute and a novel indexing system, Scanner executes large queries quite rapidly. For example, it takes only 1 second to return all hits from 10TB of log data.
How does it work?
The Scanner for Splunk plugin provides new custom search commands. When a custom command is executed, the Splunk search head executes a request against the Scanner API to kick off a search query against S3 data, and the results are returned back into Splunk.
Since the Scanner custom search commands can be used as part of the Splunk query pipeline, users can transform Scanner results using Splunk commands, or join results from Scanner together with results from Splunk indexes.
Scanner's indexes are built from the log data in a customer's S3 buckets. The index files that Scanner generates are also stored in the customer's S3 buckets, so there is no vendor lock-in: our users control all of their data.?
Since Scanner can analyze logs in their raw format, there is no need to perform a new data engineering project (eg. creating AWS Glue tables) whenever there is a new log source. Just point Scanner at the bucket, indicate which S3 key prefixes to index, and you will be off to the races.
High performance?
When a Scanner query executes, it launches serverless Lambda functions to traverse these index files at high speed, narrowing the search space rapidly. This can give teams rapid threat hunting capabilities, even on petabyte-scale log sets in S3. Scanner queries complete in a few seconds on 100TB of data, for example, whereas queries in other S3 scanning tools like AWS Athena might take a few hours.
Here is some performance data comparing Scanner with another S3 scanning tool, AWS Athena. The data set is 250TB of CloudTrail logs in JSON format. In this example, Scanner and Athena are both querying for all activity from a specific AWS Access Key over varying time ranges. Scanner is hundreds of times faster on large time ranges.
?
Re-use existing Splunk content?
Teams with extensive security content in Splunk, like alerts and dashboards, can apply them to new log sources in S3 after a few reasonably small tweaks.
Scanner's query language is roughly a subset of SPL (Splunk Search Processing Language), so it is fairly easy to adapt existing content to the log sources stored in S3.
The conversion process typically involves changing the first part of the query that performs the filtering and basic aggregations, and then piping the results to further Splunk commands for additional processing, reshaping, and joining with additional data sources, like indexes and lookup tables.
Scanner also provides out-of-the-box content for log sources that are commonly stored in S3, including AWS CloudTrail, Cloudflare, Crowdstrike FDR, and more.?
Additionally, our team provides a concierge service to help our Splunk users update their existing content, making the transition as easy as possible.
?
?
领英推荐
How Scanner compares to Splunk Federated Search for S3
Splunk's new federated S3 search also allows querying S3 logs, but Scanner for Splunk differs from it in a few ways:
Cost savings
Some of our customers move high-volume log data out of costly Splunk ingestion and into cheap S3 storage. They then use Scanner to make those logs visible in Splunk at much lower cost.
Here are some pricing examples showing cost savings after moving a high-volume log source with varying volume out of Splunk ingestion and into S3 where Scanner indexes it.
Getting started?
Here are the steps to take to get started:
?
After installing Scanner for Splunk, you can continue to use Splunk as a single pane of glass and increase your threat hunting and detection coverage to all of the log sources you want, not just those ingested by Splunk.
Take your threat hunting and detection to the next level with Scanner.
?
Visit Scanner.dev to Learn More
Threat Detection Rules, Or How to Stop Your Redis Server from Mining Bitcoin for North Korea
How Not To Spend Half a Million Dollars on Logs
Data Engineering Podcast: Build A Data Lake For Your Security Logs With Scanner
Introducing New Statistical Aggregations: Average, Percentile, Variance, and More
building mastra (YC W25), the .TS agent framework
10 个月super exciting!! great to watch this launch!