Big Data and Machine Learning

Big Data and Machine Learning

Getting to the proverbial needle in the ever-increasing security haystack!

Introduction

Recent years have seen a surge in the use of Big-Data focused technology stacks. This has been necessitated due to the large volume of data generated from different systems and applications. Each of these data points has a different context and a proper analysis, processing and storage of the data points can help organizations identify security problems in their networks.

Traditionally, this has been done using solutions – like SIEM – which provide them the ability to create rules, correlation and alerts based on the logs that are received from multiple systems across the network.

However, there are limitations to the existing solutions which don’t allow analysts and incident responders to actively search and visualize the large volumes of log data that is received daily. There are multiple reasons why this remains a challenge:

  1. Log data formats are not standardized. Most systems (firewalls, proxy, Operating Systems etc.) send their logs in their own formats.
  2. SIEM solutions which are based on relational databases require and consume heavy system resources to run simple queries across log data. This makes it extremely time-consuming to perform forensic analysis and active threat hunting – effectively increasing incidence response timelines.
  3. Volume of log data that is generated by systems is very large and most relational database based SIEM solutions are unable to handle the load or generate any query output.
  4. A human analyst is limited in his/her ability to identify anomalies in log traffic patterns, primarily due to the fact that there are a huge number of alerts and logs generated daily.

Suggested Approach

One solution is to use the strengths of Big-Data technology stacks (like ELK and Hadoop) to store, process and search data received from multiple log sources. Adding a layer of Visualization and/or Machine Learning (ML) algorithms on top of this massive data allows us to identify patterns, trends and anomalies which would be invisible to the naked eye.

Use Cases

Case 1: Compromise Assessment / Active Hunting

Once an organization is breached, it becomes imperative for the incident response team to identify all compromised systems in the environment. Traditional SIEM solutions may not be efficient in detecting additional compromised hosts (since it possibly missed detecting the first attack vector) as it may not have the capacity to quickly search through logs going back 6-12 months. For such purposes, we can deploy an Elasticsearch-Logstash-Kibana (ELK) stack which will ingest logs from following sources:

  1. Perimeter Firewall
  2. DNS Server
  3. Proxy Server
  4. Mail Server
  5. Antivirus Server
  6. IDS/IPS Server

Additionally, we can deploy a network monitor in SPAN mode to capture raw traffic logs and generate alerts based on known malicious traffic patterns.
The setup requires us to configure the ELK stack to receive the logs from the multiple sources.

Once logs have been parsed into ELK, we search individual indicators of compromise and IP addresses. Additionally, we visualize the data to identify malicious traffic patterns which can help detect ‘active threat actors’ in the organization’s network.

Case 2: Application Data Scrapping and Fraud Monitoring

Many businesses provide online data-centric services to their customers. However, these organizations are constantly under threat from competitors and malicious actors who attempt to scrape the data from the websites to build their own database. Traditional algorithms fail to detect and prevent such attempts in a timely fashion as it is very difficult to identify normal user traffic from a malicious user attempting to scrape data from the website. The proposed approach here is to capture all webserver logs with full context and feed it to a ML algorithm which will be able to identify good visitors from bad actors by analyzing patterns, trends and anomalies in their use of the application services. The amount of data captured from the server logs is expected to be huge and requires a Big Data setup to be able to quickly process and store the unstructured datasets.

To know more - consider signing up for our Big Data Workshop being held in Mumbai from 5th-7th Sep 2016.

Abdussalam Ahmad

General Manager & Board Member | Business Leader | Startups Expert

8 年

fantastic piece of information on big data and ML

Dhamotharan P

Lead Security Engineer

8 年

Nice

Sibin balan

Test Analyst at _G10X

8 年

Bravo...BBigd

Ranjan kumar

Jio Point Manager

8 年

Good

回复

要查看或添加评论,请登录

KK Mookhey的更多文章

  • LLM usage at our Company

    LLM usage at our Company

    Over the past few months, our organization has strategically employed Large Language Models (LLMs) to enhance the…

    6 条评论
  • Servant Leadership

    Servant Leadership

    What is your description of "leadership"? When you think of leaders you look up to, what adjectives would you use to…

    12 条评论
  • The Earned Life - Book Summary

    The Earned Life - Book Summary

    Marshall Goldsmith is a pioneer of the Executive Coaching industry, and touted as the world's Number One Executive…

    8 条评论
  • Book Summary - 10x is easier than 2x

    Book Summary - 10x is easier than 2x

    I was recommended this book by my coach – and found it had some very good insights, which I would like to share with…

    22 条评论
  • Secure Devops in the Microsoft World

    Secure Devops in the Microsoft World

    Not only is devops now a given in most development environments, but the implementation of security controls in a…

    1 条评论
  • Breach Response - Lessons Learnt in the past one year

    Breach Response - Lessons Learnt in the past one year

    The past 12-18 months we have seen a lot of activity in the area of breach response. We not only launched our Big Data…

    5 条评论
  • 7 Tips for Selling in Cybersecurity

    7 Tips for Selling in Cybersecurity

    Inspired by this Twitter conversation by Robert Herjavec, here are some ideas on effectively selling in the…

    12 条评论
  • Merger of Torrid Networks with Network Intelligence

    Merger of Torrid Networks with Network Intelligence

    It gives me great pleasure to inform you that Dhruv Soi (Founder of Torrid Networks and a good friend) and I have…

    28 条评论
  • Why having a bug bounty program makes sense

    Why having a bug bounty program makes sense

    Introduction Over the past few years, bug bounties have begun to garner mainstream attention. With over 150 companies…

    5 条评论

社区洞察

其他会员也浏览了