Simple Web Tool Security Scanner with Python
You will be building a Python-based security scanner that can detect XSS, SQL injection, and sensitive PII (Personally Identifiable Information).
Types of Vulnerabilities
Generally, we can categorize web security vulnerabilities into the following buckets (for even more buckets, check the OWASP Top 10):
Prerequisites
To follow along with this tutorial, you will be needing
Setting Up Our Development Environment
Let's install our required dependencies with the following command
pip install requests beautifulsoup4 urllib3 colorama
We'll use these dependencies in our code file
# Required packages
import requests
from bs4 import BeautifulSoup
import urllib.parse
import colorama
import re
from concurrent.futures import ThreadPoolExecutor
import sys
from typing import List, Dict, Set
Create our Core Scanner Class
Once you have the dependencies, it's time to write the core scanner class.
This class will serve as our main class that will handle the web security scanning functionality. It will track our visited pages and also store our findings.
We have the normalize_url function that we’ll use to ensure that you don't rescan URLs that have already been seen before. This function will essentially remove the HTTP GET parameters from the URL.
For example, https://domain.com/page?id=1 will become https://domain.com/page after normalizing it.
class WebSecurityScanner:
def __init__(self, target_url: str, max_depth: int = 3):
"""
Initialize the security scanner with a target URL and maximum crawl depth.
Args:
target_url: The base URL to scan
max_depth: Maximum depth for crawling links (default: 3)
"""
self.target_url = target_url
self.max_depth = max_depth
self.visited_urls: Set[str] = set()
self.vulnerabilities: List[Dict] = []
self.session = requests.Session()
# Initialize colorama for cross-platform colored output
colorama.init()
def normalize_url(self, url: str) -> str:
"""Normalize the URL to prevent duplicate checks"""
parsed = urllib.parse.urlparse(url)
return f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
Implementing the Crawler
The first step in our scanner is to implement a web crawler that will discover pages and URLs in a given target application. Make sure you’re writing these functions in our WebSecurityScanner class.
def crawl(self, url: str, depth: int = 0) -> None:
"""
Crawl the website to discover pages and endpoints.
Args:
url: Current URL to crawl
depth: Current depth in the crawl tree
"""
if depth > self.max_depth or url in self.visited_urls:
return
try:
self.visited_urls.add(url)
response = self.session.get(url, verify=False)
soup = BeautifulSoup(response.text, 'html.parser')
# Find all links in the page
links = soup.find_all('a', href=True)
for link in links:
next_url = urllib.parse.urljoin(url, link['href'])
if next_url.startswith(self.target_url):
self.crawl(next_url, depth + 1)
except Exception as e:
print(f"Error crawling {url}: {str(e)}")
This crawl function helps us perform a depth-first crawl of a website. It will explore all pages of a website while staying within the specified domain.
For example, if you plan to use this scanner on https://google.com, the function will first get all the URLs and then one-by-one check if they belong to the specified domain (that is, google.com). If so, it will recursively continue to scan the seen URL up to a specified depth which is supplied with the depth parameter as an argument to the function. We also have some exception handling to make sure we handle errors smoothly and report any errors during crawling.
Designing and Implementing the Security Checks
Now let's finally get to the juicy part and implement our security checks. We'll start first with SQL Injection.
领英推荐
SQL Injection Detection Check
def check_sql_injection(self, url: str) -> None:
"""Test for potential SQL injection vulnerabilities"""
sql_payloads = ["'", "1' OR '1'='1", "' OR 1=1--", "' UNION SELECT NULL--"]
for payload in sql_payloads:
try:
# Test GET parameters
parsed = urllib.parse.urlparse(url)
params = urllib.parse.parse_qs(parsed.query)
for param in params:
test_url = url.replace(f"{param}={params[param][0]}",
f"{param}={payload}")
response = self.session.get(test_url)
# Look for SQL error messages
if any(error in response.text.lower() for error in
['sql', 'mysql', 'sqlite', 'postgresql', 'oracle']):
self.report_vulnerability({
'type': 'SQL Injection',
'url': url,
'parameter': param,
'payload': payload
})
except Exception as e:
print(f"Error testing SQL injection on {url}: {str(e)}")
This function essentially performs basic SQL injection checks by testing the URL against common SQL injection payloads and looking for error messages that might hint at a security vulnerability.
Based on the error message received after performing a simple GET request on the URL, we check whether that message is a database error or not. If it is, we use the report_vulnerability function to report that as a security issue in our final report that this script will generate. For the sake of this example, we are selecting a few commonly tested SQL injection payloads, but you can extend this to test even more.
XSS (Cross-Site Scripting) Check
Now let's implement the second security check for XSS payloads.
def check_xss(self, url: str) -> None:
"""Test for potential Cross-Site Scripting vulnerabilities"""
xss_payloads = [
"<script>alert('XSS')</script>",
"<img src=x onerror=alert('XSS')>",
"javascript:alert('XSS')"
]
for payload in xss_payloads:
try:
# Test GET parameters
parsed = urllib.parse.urlparse(url)
params = urllib.parse.parse_qs(parsed.query)
for param in params:
test_url = url.replace(f"{param}={params[param][0]}",
f"{param}={urllib.parse.quote(payload)}")
response = self.session.get(test_url)
if payload in response.text:
self.report_vulnerability({
'type': 'Cross-Site Scripting (XSS)',
'url': url,
'parameter': param,
'payload': payload
})
except Exception as e:
print(f"Error testing XSS on {url}: {str(e)}")
This function, just like the SQL injection tester, uses a set of common XSS payloads and applies the same idea. But the key difference here is that we are looking for our injected payload to appear unmodified in our response rather than looking for an error message.
If you are able to see our injected payload, most likely it will be executed in the context of the victim’s browser as a reflected XSS attack.
Sensitive Information Exposure Check
Now let's implement our final check for sensitive PII.
def check_sensitive_info(self, url: str) -> None:
"""Check for exposed sensitive information"""
sensitive_patterns = {
'email': r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
'phone': r'\b\d{3}[-.]?\d{3}[-.]?\d{4}\b',
'ssn': r'\b\d{3}-\d{2}-\d{4}\b',
'api_key': r'api[_-]?key[_-]?([\'"|`])([a-zA-Z0-9]{32,45})\1'
}
try:
response = self.session.get(url)
for info_type, pattern in sensitive_patterns.items():
matches = re.finditer(pattern, response.text)
for match in matches:
self.report_vulnerability({
'type': 'Sensitive Information Exposure',
'url': url,
'info_type': info_type,
'pattern': pattern
})
except Exception as e:
print(f"Error checking sensitive information on {url}: {str(e)}")
This function uses a set of predefined Regex patterns to search for PII like emails, phone numbers, SSNs, and API keys (that are prefixed with api-key-<number>).
Just like the previous two functions, we use the response text for the URL and our Regex patterns to find these PIIs in the response text. If we do find any, we report them with the report_vulnerability function. Make sure to have all these functions defined in the WebSecurityScanner class.
Implementing the Main Scanning Logic
Let's finally stitch everything together by defining the scan and report_vulnerability function in the WebSecurityScanner class
def scan(self) -> List[Dict]:
"""
Main scanning method that coordinates the security checks
Returns:
List of discovered vulnerabilities
"""
print(f"\n{colorama.Fore.BLUE}Starting security scan of {self.target_url}{colorama.Style.RESET_ALL}\n")
# First, crawl the website
self.crawl(self.target_url)
# Then run security checks on all discovered URLs
with ThreadPoolExecutor(max_workers=5) as executor:
for url in self.visited_urls:
executor.submit(self.check_sql_injection, url)
executor.submit(self.check_xss, url)
executor.submit(self.check_sensitive_info, url)
return self.vulnerabilities
def report_vulnerability(self, vulnerability: Dict) -> None:
"""Record and display found vulnerabilities"""
self.vulnerabilities.append(vulnerability)
print(f"{colorama.Fore.RED}[VULNERABILITY FOUND]{colorama.Style.RESET_ALL}")
for key, value in vulnerability.items():
print(f"{key}: {value}")
print()
This code defines our scan function which will essentially invoke the crawl function and recursively start crawling the website. With multithreading, we will apply all three security checks on the visited URLs.
We have also defined the report_vulnerability function which will effectively print our vulnerability to the console and also store them in our vulnerabilities array.
Now let's finally use our scanner by saving it as scanner.py
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python scanner.py <target_url>")
sys.exit(1)
target_url = sys.argv[1]
scanner = WebSecurityScanner(target_url)
vulnerabilities = scanner.scan()
# Print summary
print(f"\n{colorama.Fore.GREEN}Scan Complete!{colorama.Style.RESET_ALL}")
print(f"Total URLs scanned: {len(scanner.visited_urls)}")
print(f"Vulnerabilities found: {len(vulnerabilities)}")
The target URL will be supplied as a system argument and we will get the summary of URLs scanned and vulnerabilities found at the end of our scan. Now let’s discuss how you can extend the scanner and add more features.
Extending the Security Scanner
Here are some ideas to extend this basic security scanner into something even more advanced
Wrapping Up
Now you know how to build a basic security scanner! This scanner demonstrates a few core concepts of Web Security.
Keep in mind that this tutorial should only be used for educational purposes. There are several professionally designed enterprise-grade applications like Burp Suite and OWASP Zap that can check for hundreds of security vulnerabilities at a much larger scale.
I hope you learned the basics of web security and a bit of Python programming as well.
Software Engineer at FTS
2 个月Love this