登录查看更多内容

Simple Web Tool Security Scanner with Python

Long Nguyen

Technical Lead at TheMartec

发布日期: 2024年12月16日

You will be building a Python-based security scanner that can detect XSS, SQL injection, and sensitive PII (Personally Identifiable Information).

Types of Vulnerabilities

Generally, we can categorize web security vulnerabilities into the following buckets (for even more buckets, check the OWASP Top 10):

SQL injection: A technique where attackers are able to insert malicious SQL code into SQL queries through unvalidated inputs, allowing them to modify / read database contents.
Cross-Site Scripting (XSS): A technique where attackers inject malicious JavaScript in trusted websites. This allows them to execute the JavaScript code in the context of the browser and steal sensitive information or perform unauthorized operations.
Sensitive information exposure: A security issue where an application unintentionally reveals sensitive data like passwords, API keys and so on through logs, insecure storage, and other vulnerabilities.
Common security misconfigurations: Security issues that occurs due to improper configuration of web servers – like default credentials for administrator accounts, enabled debug mode, publicly available administrator dashboards with weak credentials, and so on.
Basic authentication weaknesses: Security issues that occur due to lapses in password policies, user authentication processes, improper session management, and so on.

Prerequisites

To follow along with this tutorial, you will be needing

Python 3.x
Basic understanding of HTTP protocols
Basic understanding of web applications
Basic understanding of how XSS, SQL injection, and basic security attacks work

Setting Up Our Development Environment

Let's install our required dependencies with the following command

pip install requests beautifulsoup4 urllib3 colorama

We'll use these dependencies in our code file

# Required packages
import requests
from bs4 import BeautifulSoup
import urllib.parse
import colorama
import re
from concurrent.futures import ThreadPoolExecutor
import sys
from typing import List, Dict, Set

Create our Core Scanner Class

Once you have the dependencies, it's time to write the core scanner class.

This class will serve as our main class that will handle the web security scanning functionality. It will track our visited pages and also store our findings.

We have the normalize_url function that we’ll use to ensure that you don't rescan URLs that have already been seen before. This function will essentially remove the HTTP GET parameters from the URL.

For example, https://domain.com/page?id=1 will become https://domain.com/page after normalizing it.

class WebSecurityScanner:
    def __init__(self, target_url: str, max_depth: int = 3):
        """
        Initialize the security scanner with a target URL and maximum crawl depth.

        Args:
            target_url: The base URL to scan
            max_depth: Maximum depth for crawling links (default: 3)
        """
        self.target_url = target_url
        self.max_depth = max_depth
        self.visited_urls: Set[str] = set()
        self.vulnerabilities: List[Dict] = []
        self.session = requests.Session()

        # Initialize colorama for cross-platform colored output
        colorama.init()

    def normalize_url(self, url: str) -> str:
        """Normalize the URL to prevent duplicate checks"""
        parsed = urllib.parse.urlparse(url)
        return f"{parsed.scheme}://{parsed.netloc}{parsed.path}"

Implementing the Crawler

The first step in our scanner is to implement a web crawler that will discover pages and URLs in a given target application. Make sure you’re writing these functions in our WebSecurityScanner class.

def crawl(self, url: str, depth: int = 0) -> None:
    """
    Crawl the website to discover pages and endpoints.

    Args:
        url: Current URL to crawl
        depth: Current depth in the crawl tree
    """
    if depth > self.max_depth or url in self.visited_urls:
        return

    try:
        self.visited_urls.add(url)
        response = self.session.get(url, verify=False)
        soup = BeautifulSoup(response.text, 'html.parser')

        # Find all links in the page
        links = soup.find_all('a', href=True)
        for link in links:
            next_url = urllib.parse.urljoin(url, link['href'])
            if next_url.startswith(self.target_url):
                self.crawl(next_url, depth + 1)

    except Exception as e:
        print(f"Error crawling {url}: {str(e)}")

This crawl function helps us perform a depth-first crawl of a website. It will explore all pages of a website while staying within the specified domain.

For example, if you plan to use this scanner on https://google.com, the function will first get all the URLs and then one-by-one check if they belong to the specified domain (that is, google.com). If so, it will recursively continue to scan the seen URL up to a specified depth which is supplied with the depth parameter as an argument to the function. We also have some exception handling to make sure we handle errors smoothly and report any errors during crawling.

Designing and Implementing the Security Checks

Now let's finally get to the juicy part and implement our security checks. We'll start first with SQL Injection.

领英推荐

Securing Your Laravel Applications: Best Practices for…

EsspeSoft 11 个月前

Python script to automate sending daily email reports

EduRamp Learning Services Pvt. Ltd. 7 个月前

Testing FastAPI application with PostgreSQL database -…

SP-Lutsk 9 个月前

SQL Injection Detection Check

def check_sql_injection(self, url: str) -> None:
    """Test for potential SQL injection vulnerabilities"""
    sql_payloads = ["'", "1' OR '1'='1", "' OR 1=1--", "' UNION SELECT NULL--"]

    for payload in sql_payloads:
        try:
            # Test GET parameters
            parsed = urllib.parse.urlparse(url)
            params = urllib.parse.parse_qs(parsed.query)

            for param in params:
                test_url = url.replace(f"{param}={params[param][0]}", 
                                     f"{param}={payload}")
                response = self.session.get(test_url)

                # Look for SQL error messages
                if any(error in response.text.lower() for error in 
                    ['sql', 'mysql', 'sqlite', 'postgresql', 'oracle']):
                    self.report_vulnerability({
                        'type': 'SQL Injection',
                        'url': url,
                        'parameter': param,
                        'payload': payload
                    })

        except Exception as e:
            print(f"Error testing SQL injection on {url}: {str(e)}")

This function essentially performs basic SQL injection checks by testing the URL against common SQL injection payloads and looking for error messages that might hint at a security vulnerability.

Based on the error message received after performing a simple GET request on the URL, we check whether that message is a database error or not. If it is, we use the report_vulnerability function to report that as a security issue in our final report that this script will generate. For the sake of this example, we are selecting a few commonly tested SQL injection payloads, but you can extend this to test even more.

XSS (Cross-Site Scripting) Check

Now let's implement the second security check for XSS payloads.

def check_xss(self, url: str) -> None:
    """Test for potential Cross-Site Scripting vulnerabilities"""
    xss_payloads = [
        "<script>alert('XSS')</script>",
        "<img src=x onerror=alert('XSS')>",
        "javascript:alert('XSS')"
    ]

    for payload in xss_payloads:
        try:
            # Test GET parameters
            parsed = urllib.parse.urlparse(url)
            params = urllib.parse.parse_qs(parsed.query)

            for param in params:
                test_url = url.replace(f"{param}={params[param][0]}", 
                                     f"{param}={urllib.parse.quote(payload)}")
                response = self.session.get(test_url)

                if payload in response.text:
                    self.report_vulnerability({
                        'type': 'Cross-Site Scripting (XSS)',
                        'url': url,
                        'parameter': param,
                        'payload': payload
                    })

        except Exception as e:
            print(f"Error testing XSS on {url}: {str(e)}")

This function, just like the SQL injection tester, uses a set of common XSS payloads and applies the same idea. But the key difference here is that we are looking for our injected payload to appear unmodified in our response rather than looking for an error message.

If you are able to see our injected payload, most likely it will be executed in the context of the victim’s browser as a reflected XSS attack.

Sensitive Information Exposure Check

Now let's implement our final check for sensitive PII.

def check_sensitive_info(self, url: str) -> None:
    """Check for exposed sensitive information"""
    sensitive_patterns = {
        'email': r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
        'phone': r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b',
        'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
        'api_key': r'api[_-]?key[_-]?([\'"|`])([a-zA-Z0-9]{32,45})\1'
    }

    try:
        response = self.session.get(url)

        for info_type, pattern in sensitive_patterns.items():
            matches = re.finditer(pattern, response.text)
            for match in matches:
                self.report_vulnerability({
                    'type': 'Sensitive Information Exposure',
                    'url': url,
                    'info_type': info_type,
                    'pattern': pattern
                })

    except Exception as e:
        print(f"Error checking sensitive information on {url}: {str(e)}")

This function uses a set of predefined Regex patterns to search for PII like emails, phone numbers, SSNs, and API keys (that are prefixed with api-key-<number>).

Just like the previous two functions, we use the response text for the URL and our Regex patterns to find these PIIs in the response text. If we do find any, we report them with the report_vulnerability function. Make sure to have all these functions defined in the WebSecurityScanner class.

Implementing the Main Scanning Logic

Let's finally stitch everything together by defining the scan and report_vulnerability function in the WebSecurityScanner class

def scan(self) -> List[Dict]:
    """
    Main scanning method that coordinates the security checks

    Returns:
        List of discovered vulnerabilities
    """
    print(f"\n{colorama.Fore.BLUE}Starting security scan of {self.target_url}{colorama.Style.RESET_ALL}\n")

    # First, crawl the website
    self.crawl(self.target_url)

    # Then run security checks on all discovered URLs
    with ThreadPoolExecutor(max_workers=5) as executor:
        for url in self.visited_urls:
            executor.submit(self.check_sql_injection, url)
            executor.submit(self.check_xss, url)
            executor.submit(self.check_sensitive_info, url)

    return self.vulnerabilities

def report_vulnerability(self, vulnerability: Dict) -> None:
    """Record and display found vulnerabilities"""
    self.vulnerabilities.append(vulnerability)
    print(f"{colorama.Fore.RED}[VULNERABILITY FOUND]{colorama.Style.RESET_ALL}")
    for key, value in vulnerability.items():
        print(f"{key}: {value}")
    print()

This code defines our scan function which will essentially invoke the crawl function and recursively start crawling the website. With multithreading, we will apply all three security checks on the visited URLs.

We have also defined the report_vulnerability function which will effectively print our vulnerability to the console and also store them in our vulnerabilities array.

Now let's finally use our scanner by saving it as scanner.py

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python scanner.py <target_url>")
        sys.exit(1)

    target_url = sys.argv[1]
    scanner = WebSecurityScanner(target_url)
    vulnerabilities = scanner.scan()

    # Print summary
    print(f"\n{colorama.Fore.GREEN}Scan Complete!{colorama.Style.RESET_ALL}")
    print(f"Total URLs scanned: {len(scanner.visited_urls)}")
    print(f"Vulnerabilities found: {len(vulnerabilities)}")

The target URL will be supplied as a system argument and we will get the summary of URLs scanned and vulnerabilities found at the end of our scan. Now let’s discuss how you can extend the scanner and add more features.

Extending the Security Scanner

Here are some ideas to extend this basic security scanner into something even more advanced

Add more vulnerability checks like CSRF detection, directory traversal, and so on.
Improve reporting with an HTML or PDF output.
Add configuration options for scan intensity and scope of searching (specifying the depth of scans through a CLI argument).
Implementing proper rate limiting.
Adding authentication support for testing URLs that require session-based authentication.
And more features can upgrade.

Wrapping Up

Now you know how to build a basic security scanner! This scanner demonstrates a few core concepts of Web Security.

Keep in mind that this tutorial should only be used for educational purposes. There are several professionally designed enterprise-grade applications like Burp Suite and OWASP Zap that can check for hundreds of security vulnerabilities at a much larger scale.

I hope you learned the basics of web security and a bit of Python programming as well.

John Nguyen

Software Engineer at FTS

2 个月

Love this

要查看或添加评论，请登录

Long Nguyen的更多文章

Confusion or a new way of communicating about AI?

2025年2月4日

Confusion or a new way of communicating about AI?

Recently, DeepSeek has been mentioned as a phenomenon or a new breakthrough in the field of AI, when with a small…

3 条评论
Cross-Platform App Development War Between Languages Today?

2024年12月28日

Cross-Platform App Development War Between Languages Today?

The debate over which cross-platform technology is the best — Node.js, Flutter, C#, Rust, Kotlin, and Golang — involves…
Manage Packages or Dependencies NodeJS Project with NPM vs Yarn or PNPM?

2024年12月28日

Manage Packages or Dependencies NodeJS Project with NPM vs Yarn or PNPM?

When developing applications, there is a need to utilize external packages to ensure the app functions properly. In…
Happy Christmas with Python version 3.x

2024年12月24日

Happy Christmas with Python version 3.x

Today 24 Dec, 2024. I use python version 3.

1 条评论
Saving AWS Cost using ECR Lifecycle Rules

2024年12月8日

Saving AWS Cost using ECR Lifecycle Rules

Context For fun, I logged into an AWS account that is used for development purposes. Due to it's purpose, there was a…

1 条评论
Lambda Payload Limit and Options for Fix!

2024年12月8日

Lambda Payload Limit and Options for Fix!

AWS Lambda has a 6MB limit that applies both to request and response payloads. This can be problematic if you have an…
Terraform State - Native S3 Locking

2024年12月8日

Terraform State - Native S3 Locking

Terraform has many popular mechanisms for storing its state file. While I have grown quite fond of HCP Terraform, there…

1 条评论
Data Structures & Algorithms for Frontend Side?

2024年11月21日

Data Structures & Algorithms for Frontend Side?

Frontend focused interviews often don’t care about DSA at all. And for those of us that remember studying DSA at…
Backup Docker Volumes

2024年10月30日

Backup Docker Volumes

When working with Docker, it helps us reduce worries about the environment, OS, language version or library as well as…

1 条评论
Session vs Cookie and their relationship in web applications!

2024年10月23日

Session vs Cookie and their relationship in web applications!

When you work with web application projects or internet environments, whether it is Frontend, Backend or DevOps, you…

1 条评论

See all articles

Simple Web Tool Security Scanner with Python

Long Nguyen

Technical Lead at TheMartec

Types of Vulnerabilities

Prerequisites

Setting Up Our Development Environment

Create our Core Scanner Class

Implementing the Crawler

Designing and Implementing the Security Checks

领英推荐

SQL Injection Detection Check

XSS (Cross-Site Scripting) Check

Sensitive Information Exposure Check

Implementing the Main Scanning Logic

Extending the Security Scanner

Wrapping Up

Long Nguyen的更多文章

社区洞察

其他会员也浏览了

PHP MYSQLi Prepared Statement : Guard SQL Injection

The LOR Stack

Building your first web app with Flask

Developers - It's 2023. Can we please stop creating SQL Injections?

Shift Left in Security with OpenSource tools

Building Custom Middleware in Django: A Powerful Way to Enhance Your Web Application

Exploit Minecraft Server | HackTheBox Crafty Walkthrough

Read HTML Tables Using Pandas

UUID vs ULID: Why, Where, and When?

Types of Vulnerabilities

Prerequisites

Setting Up Our Development Environment

Create our Core Scanner Class

Implementing the Crawler

Designing and Implementing the Security Checks

领英推荐

SQL Injection Detection Check

XSS (Cross-Site Scripting) Check

Sensitive Information Exposure Check

Implementing the Main Scanning Logic

Extending the Security Scanner

Wrapping Up

Long Nguyen的更多文章

Confusion or a new way of communicating about AI?

Cross-Platform App Development War Between Languages Today?

Manage Packages or Dependencies NodeJS Project with NPM vs Yarn or PNPM?

Happy Christmas with Python version 3.x

Saving AWS Cost using ECR Lifecycle Rules

Lambda Payload Limit and Options for Fix!

Terraform State - Native S3 Locking

Data Structures & Algorithms for Frontend Side?

Backup Docker Volumes

Session vs Cookie and their relationship in web applications!

社区洞察

其他会员也浏览了

PHP MYSQLi Prepared Statement : Guard SQL Injection

The LOR Stack

Building your first web app with Flask

Developers - It's 2023. Can we please stop creating SQL Injections?

Shift Left in Security with OpenSource tools

Building Custom Middleware in Django: A Powerful Way to Enhance Your Web Application

Exploit Minecraft Server | HackTheBox Crafty Walkthrough

Read HTML Tables Using Pandas

UUID vs ULID: Why, Where, and When?