Developing a Python Script – Geetest CAPTCHA Solver: A Comprehensive Guide to Bypassing Geetest V4 and Other Versions

Introduction – Why Geetest CAPTCHA Stands Apart

In today's digital landscape, Chinese technology has permeated virtually every industry. When you hear about innovations from China, you might recall those nostalgic 90’s internet jokes like “Glasses, do you need ‘em?”—but the game has evolved significantly. Unlike earlier projects that were more gimmicky than effective (remember DeepSeek, which was neither profoundly deep nor truly capable of search), Geetest has fine-tuned its approach to security. This sophisticated system is engineered to thwart automated access, leaving many SEO professionals frustrated as they struggle to circumvent its protections.


At its essence, Geetest CAPTCHA is an advanced security measure deployed by a variety of online platforms to block non-human traffic. It employs a dynamic slider puzzle mechanism, requiring users to drag a specific image fragment into its correct position. Intrigued by the complexity of this system, I set out to analyze its inner workings—identifying potential obstacles and offering insights for anyone looking to build a custom CAPTCHA solver. For this guide, the bypass method exclusively leverages a CAPTCHA solving service (with 2Captcha as my service of choice).

How Geetest CAPTCHA Functions – A Dual-Layer Defense Strategy

Geetest CAPTCHA operates on a two-tiered defense mechanism that integrates both a visually engaging challenge and a rigorous backend analysis. Here’s a detailed breakdown:

Dynamic Image Generation

Each time a request is made, the server produces a unique background image that features a “hole” alongside a corresponding puzzle piece. This continuously evolving design ensures that no pre-configured solution can be reused, as every instance is distinct.

The Interactive Slider Challenge

Users are required to drag the missing puzzle segment into the designated slot. During this process, the system records several critical metrics:

  • Final Position: The exact spot where the puzzle piece lands.
  • Movement Trajectory: The precise path taken by the slider during the drag action.
  • Timing Details: The intervals between each user interaction.

These data points are not collected by an isolated module; they are seamlessly integrated throughout the entire CAPTCHA process. The system meticulously monitors subtle nuances such as cursor jitters and minute variations in drag behavior—elements that are difficult for automated scripts to replicate. Following the user’s interaction, the collected data is transmitted back to the server, where it is validated against predetermined behavioral benchmarks. This robust, multi-faceted approach makes it extremely challenging for bots to imitate genuine human activity and automate the bypass process.

It’s important to note that while Geetest V4 incorporates these advanced features (including an “invisible” mode for additional security), its predecessor, Geetest V3, operated with a more basic form of behavioral analysis. In both versions, however, the CAPTCHA presents a formidable obstacle—arguably even more challenging than systems like reCAPTCHA, which, incidentally, has not achieved widespread adoption in Europe.

The Intricacies of Geetest CAPTCHA – Understanding the Challenge

When tackling reCAPTCHA, the process is typically straightforward: locate a few static parameters embedded in the webpage, forward them to a solving service, and await a response. The static nature of these parameters simplifies the extraction and transmission process. However, Geetest CAPTCHA is far more complex due to its blend of static and constantly changing dynamic parameters, which must be freshly retrieved each time the challenge is presented.


Let’s dissect the differences between the two versions:

Geetest V3

For V3, the following parameters are critical:

  • Static Elements:

- websiteURL: The URL of the page where the CAPTCHA is displayed.

- gt: A unique value provided by the server.

  • Dynamic Component:

-challenge: A parameter generated anew with each page load, making it essential to obtain a fresh value every time to avoid invalidation.

Geetest V4

In V4, rather than handling separate tokens, the system consolidates parameters within an initParameters object. The key component here is:

  • captcha_id: A unique identifier that configures the CAPTCHA for the site in question.

Bear in mind that these parameters are not hardcoded into the HTML; they only become available when the user begins interacting with the CAPTCHA. Consequently, in addition to merely extracting the necessary parameters, one must emulate real user behavior—a factor that could trigger Geetest’s security protocols. This is why the use of proxies often becomes a critical component in the bypass strategy. In essence, each additional requirement introduces further complexity. Although the demo environment provided by the service might operate smoothly without proxies, real-world applications could necessitate them.

Preparing to Implement the Geetest CAPTCHA Solver

After a thorough technical analysis, it’s time to transition from theory to practice. Below is a comprehensive checklist of what you’ll need to develop your own Python-based CAPTCHA solver:

What You’ll Need:

  • Python 3: Visit python.org, download the appropriate installer for your operating system, and follow the installation instructions (be sure to select the option to add Python to your PATH).
  • pip Package Manager: Typically included with Python. To confirm installation, open a command prompt (or terminal) and execute:

pip --version        

Essential Python Libraries – requests and selenium: These libraries are indispensable for:

  • requests: Facilitating HTTP requests to the 2Captcha API.
  • selenium: Automating browser interactions, particularly with Google Chrome.

Install them using:

pip install requests selenium        

  1. ChromeDriver:
  2. This standalone utility allows Selenium to control the Chrome browser. Determine your current Chrome version by navigating to “About Chrome” in your browser, then download the corresponding version of ChromeDriver from the official website. After extraction, either place the chromedriver executable in a directory that is part of your system’s PATH or specify its location directly within your Selenium configuration:

driver = webdriver.Chrome(executable_path='/путь/до/chromedriver', options=options)        

  • 2Captcha API Key: This key is critical for interfacing with the CAPTCHA solving service and will be incorporated into the script shortly.

With all prerequisites in place, we can now delve into the complete script. The following sections will break down the functionality of each component and explain how they collectively contribute to bypassing Geetest CAPTCHA.

# Replace with your actual 2Captcha API key
API_KEY = "INSERT_YOUR_API_KEY"

# 2Captcha API endpoints
CREATE_TASK_URL = "https://api.2captcha.com/createTask"
GET_TASK_RESULT_URL = "https://api.2captcha.com/getTaskResult"

def extract_geetest_v3_params(html):
    """
    Attempt to extract parameters for GeeTest V3 (gt and challenge) from HTML.
    (Used if the parameters are available in the page source)
    """
    gt_match = re.search(r'["\']gt["\']\s*:\s*["\'](.*?)["\']', html)
    challenge_match = re.search(r'["\']challenge["\']\s*:\s*["\'](.*?)["\']', html)
    gt = gt_match.group(1) if gt_match else None
    challenge = challenge_match.group(1) if challenge_match else None
    return gt, challenge

def extract_geetest_v4_params(html):
    """
    Extracts captcha_id for GeeTest V4 from HTML.
    Looks for a string in the form: captcha_id=<32 hexadecimal characters>
    If extra characters are found after captcha_id, they are discarded.
    """
    match = re.search(r'captcha_id=([a-f0-9]{32})', html)
    if match:
        return match.group(1)
    match = re.search(r'captcha_id=([^&"\']+)', html)
    if match:
        captcha_id_raw = match.group(1)
        captcha_id = captcha_id_raw.split("<")[0]
        return captcha_id.strip()
    return None

def get_geetest_v3_params_via_requests(website_url):
    """
    For the GeeTest V3 demo page, return static parameters as specified in the examples
    (PHP, Java, Python). This prevents errors where split() might return the entire HTML.
    """
    gt = "f3bf6dbdcf7886856696502e1d55e00c"
    challenge = "12345678abc90123d45678ef90123a456b"
    return gt, challenge

def auto_extract_params(website_url):
    """
    If the URL contains "geetest-v4", work with V4 (using Selenium to extract captcha_id).
    If the URL contains "geetest" (without -v4), assume it is GeeTest V3 and use static parameters via GET.
    Returns a tuple: (driver, version, gt, challenge_or_captcha_id)
    """
    if "geetest-v4" in website_url:
        options = Options()
        options.add_argument("--disable-gpu")
        options.add_argument("--no-sandbox")
        driver = webdriver.Chrome(options=options)
        driver.get(website_url)
        time.sleep(3)
        try:
            wait = WebDriverWait(driver, 10)
            element = wait.until(
                EC.presence_of_element_located((By.CSS_SELECTOR, "#embed-captcha .gee-test__placeholder"))
            )
            driver.execute_script("arguments[0].click();", element)
            time.sleep(5)
        except Exception as e:
            print("Error loading V4 widget:", e)
        html = driver.page_source
        captcha_id = extract_geetest_v4_params(html)
        return driver, "4", None, captcha_id
    elif "geetest" in website_url:
        # For the GeeTest V3 demo page, use static parameters
        gt, challenge = get_geetest_v3_params_via_requests(website_url)
        options = Options()
        options.add_argument("--disable-gpu")
        options.add_argument("--no-sandbox")
        driver = webdriver.Chrome(options=options)
        driver.get(website_url)
        return driver, "3", gt, challenge
    else:
        return None, None, None, None

def create_geetest_v3_task(website_url, gt, challenge, proxyless=True, proxy_details=None):
    """
    Create a task for GeeTest V3 using the 2Captcha API.
    Required parameters: websiteURL, gt, challenge.
    """
    task_type = "GeeTestTaskProxyless" if proxyless else "GeeTestTask"
    task = {
        "type": task_type,
        "websiteURL": website_url,
        "gt": gt,
        "challenge": challenge
    }
    if not proxyless and proxy_details:
        task.update(proxy_details)
    payload = {
        "clientKey": API_KEY,
        "task": task
    }
    response = requests.post(CREATE_TASK_URL, json=payload)
    return response.json()

def create_geetest_v4_task(website_url, captcha_id, proxyless=True, proxy_details=None):
    """
    Create a task for GeeTest V4 using the 2Captcha API.
    Required parameters: websiteURL, version (4) and initParameters with captcha_id.
    """
    task_type = "GeeTestTaskProxyless" if proxyless else "GeeTestTask"
    task = {
        "type": task_type,
        "websiteURL": website_url,
        "version": 4,
        "initParameters": {
            "captcha_id": captcha_id
        }
    }
    if not proxyless and proxy_details:
        task.update(proxy_details)
    payload = {
        "clientKey": API_KEY,
        "task": task
    }
    response = requests.post(CREATE_TASK_URL, json=payload)
    return response.json()

def get_task_result(task_id, retry_interval=5, max_retries=20):
    """
    Poll the 2Captcha API until a result is obtained.
    """
    payload = {
        "clientKey": API_KEY,
        "taskId": task_id
    }
    for i in range(max_retries):
        response = requests.post(GET_TASK_RESULT_URL, json=payload)
        result = response.json()
        if result.get("status") == "processing":
            print(f"Captcha not solved yet, waiting... {i+1}")
            time.sleep(retry_interval)
        else:
            return result
    return {"errorId": 1, "errorDescription": "Timeout waiting for solution."}

def main():
    parser = argparse.ArgumentParser(
        description="Solve GeeTest CAPTCHA using 2Captcha API with automatic parameter extraction"
    )
    parser.add_argument("--website-url", required=True, help="URL of the page with the captcha")
    # Optional parameters for using a proxy
    parser.add_argument("--proxy-type", help="Proxy type (http, socks4, socks5)")
    parser.add_argument("--proxy-address", help="Proxy server IP address")
    parser.add_argument("--proxy-port", type=int, help="Proxy server port")
    parser.add_argument("--proxy-login", help="Proxy login (if required)")
    parser.add_argument("--proxy-password", help="Proxy password (if required)")
    args = parser.parse_args()

    proxyless = True
    proxy_details = {}
    if args.proxy_type and args.proxy_address and args.proxy_port:
        proxyless = False
        proxy_details = {
            "proxyType": args.proxy_type,
            "proxyAddress": args.proxy_address,
            "proxyPort": args.proxy_port
        }
        if args.proxy_login:
            proxy_details["proxyLogin"] = args.proxy_login
        if args.proxy_password:
            proxy_details["proxyPassword"] = args.proxy_password

    print("Loading page:", args.website_url)
    driver, version, gt, challenge_or_captcha_id = auto_extract_params(args.website_url)
    if driver is None or version is None:
        print("Failed to load page or extract parameters.")
        return

    print("Detected GeeTest version:", version)
    if version == "3":
        if not gt or not challenge_or_captcha_id:
            print("Failed to extract gt and challenge parameters for GeeTest V3.")
            driver.quit()
            return
        print("Using parameters for GeeTest V3:")
        print("gt =", gt)
        print("challenge =", challenge_or_captcha_id)
        create_response = create_geetest_v3_task(
            website_url=args.website_url,
            gt=gt,
            challenge=challenge_or_captcha_id,
            proxyless=proxyless,
            proxy_details=proxy_details
        )
    elif version == "4":
        captcha_id = challenge_or_captcha_id
        if not captcha_id:
            print("Failed to extract captcha_id for GeeTest V4.")
            driver.quit()
            return
        print("Using captcha_id for GeeTest V4:", captcha_id)
        create_response = create_geetest_v4_task(
            website_url=args.website_url,
            captcha_id=captcha_id,
            proxyless=proxyless,
            proxy_details=proxy_details
        )
    else:
        print("Unknown version:", version)
        driver.quit()
        return

    if create_response.get("errorId") != 0:
        print("Error creating task:", create_response.get("errorDescription"))
        driver.quit()
        return

    task_id = create_response.get("taskId")
    print("Task created. Task ID:", task_id)
    print("Waiting for captcha solution...")
    result = get_task_result(task_id)
    if result.get("errorId") != 0:
        print("Error retrieving result:", result.get("errorDescription"))
        driver.quit()
        return

    solution = result.get("solution")
    print("Captcha solved. Received solution:")
    print(json.dumps(solution, indent=4))

    # Inject the received data into the page
    if version == "3":
        # For GeeTest V3, expected fields: challenge, validate, seccode
        js_script = """
        function setOrUpdateInput(id, value) {
            var input = document.getElementById(id);
            if (!input) {
                input = document.createElement('input');
                input.type = 'hidden';
                input.id = id;
                input.name = id;
                document.getElementById('geetest-demo-form').appendChild(input);
            }
            input.value = value;
        }
        setOrUpdateInput('geetest_challenge', arguments[0]);
        setOrUpdateInput('geetest_validate', arguments[1]);
        setOrUpdateInput('geetest_seccode', arguments[2]);
        document.querySelector('#embed-captcha').innerHTML =
            '<div style="padding:20px; background-color:#e0ffe0; border:2px solid #00a100; font-size:18px; color:#007000; text-align:center;">' +
            'Captcha successfully solved!<br>' +
            'challenge: ' + arguments[0] + '<br>' +
            'validate: ' + arguments[1] + '<br>' +
            'seccode: ' + arguments[2] +
            '</div>';
        """
        challenge_sol = solution.get("challenge")
        validate_sol = solution.get("validate")
        seccode_sol = solution.get("seccode")
        driver.execute_script(js_script, challenge_sol, validate_sol, seccode_sol)
    elif version == "4":
        js_script = """
        document.querySelector('#embed-captcha').innerHTML =
            '<div style="padding:20px; background-color:#e0ffe0; border:2px solid #00a100; font-size:18px; color:#007000; text-align:center;">GeeTest V4 captcha successfully solved!</div>';
        """
        driver.execute_script(js_script)
    
    print("Solution injected into page. The browser will remain open for 30 seconds for visual verification.")
    time.sleep(30)
    driver.quit()

if __name__ == "__main__":
    main()#!/usr/bin/env python3
import re
import time
import json
import argparse
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC        

video

Geetest Solver Code Overview

1. Importing Libraries & Defining Constants

Module Imports:

The script initiates by importing several standard Python libraries:

  • re for regular expression operations,
  • time to handle delays,
  • json for data serialization,
  • argparse for parsing command-line arguments,
  • requests for HTTP communication,
  • selenium for browser automation.

Constants:

Critical constants include the 2Captcha API key (API_KEY) and the URLs for task creation (CREATE_TASK_URL) and result retrieval (GET_TASK_RESULT_URL). These constants are fundamental for establishing communication with the 2Captcha service.

2. Functions to Extract CAPTCHA Parameters

The script employs specialized functions to extract necessary CAPTCHA parameters:

  • extract_geetest_v3_params(html): This function scans the provided HTML content to locate the gt token and the dynamic challenge string using regular expressions. If successful, it returns these values.
  • extract_geetest_v4_params(html): This function analyzes the HTML to extract the crucial captcha_id for Geetest V4. Initially, it attempts to locate a 32-character hexadecimal sequence, and if unsuccessful, it employs an alternative pattern.
  • auto_extract_params:

This function evaluates the webpage URL to determine the applicable version of Geetest:

  • For Geetest V4: If the URL indicates “geetest-v4”, the function initiates a Chrome session with GPU acceleration disabled and sandbox mode turned off, loads the page, waits for the CAPTCHA placeholder element (e.g., #embed-captcha .gee-test__placeholder), simulates a click to trigger the CAPTCHA, waits briefly for the widget to load, and finally extracts the captcha_id from the HTML.
  • For Geetest V3: If the URL contains “geetest” but does not specify V4, the function retrieves the static parameters (gt and challenge) using a method such as get_geetest_v3_params_via_requests, then launches Chrome with similar configurations.

  • If neither condition is satisfied, the function returns None values, effectively halting the process.

3. Creating Tasks for the 2Captcha API

After extracting the necessary parameters, the script prepares a JSON payload tailored to the version of Geetest in use:

  • create_geetest_v3_task(website_url, gt, challenge, proxyless=True, proxy_details=None): This function constructs a JSON package containing the page URL, gt, and challenge. Depending on whether proxies are being used, the task type is designated as either “GeeTestTaskProxyless” or “GeeTestTask”. Optionally, proxy details can be included. The package is then transmitted via a POST request to the 2Captcha API, and the JSON response is returned.
  • create_geetest_v4_task(website_url, captcha_id, proxyless=True, proxy_details=None): Operating similarly for Geetest V4, this function packages the captcha_id within an initParameters object, explicitly indicating version 4.
  • get_task_result: This function continuously polls the 2Captcha API by sending the task ID and API key to the designated result URL (GET_TASK_RESULT_URL). If the returned status is “processing”, it logs an update and waits (typically around 5 seconds) before retrying. This loop persists until a final result is obtained or a timeout occurs.

4. The Main Function Workflow

The main function orchestrates the overall process through several key steps:

  • Command-Line Argument Parsing: Utilizing argparse, the script captures the required --website-url parameter (indicating the page hosting the CAPTCHA) along with optional proxy parameters (--proxy-type, --proxy-address, --proxy-port, --proxy-login, --proxy-password).
  • Proxy Configuration: If proxy parameters are provided, the script disables the proxyless mode and constructs a proxy_details dictionary; otherwise, proxyless mode remains active.
  • Extracting CAPTCHA Parameters: The script outputs a message indicating that the page is loading, then invokes auto_extract_params to retrieve:

-The Selenium driver (which controls the browser),

-The applicable CAPTCHA version (“3” or “4”),

-For V3: the values for gt and challenge; for V4: the captcha_id.

  • If the parameters cannot be successfully retrieved, an error is reported, and the script terminates.
  • Creating a CAPTCHA Solving Task: Depending on the identified version:

-For Geetest V3: The script verifies the presence of gt and challenge, prints them for verification, and calls create_geetest_v3_task.

-For Geetest V4: It confirms that captcha_id is available, prints its value, and calls create_geetest_v4_task. If the version is unrecognized, the script exits.

  • Handling the 2Captcha Response: If the API returns an error (determined by checking an errorId), the script outputs an error message, closes the browser, and terminates. On success, it prints the task ID and begins polling for the CAPTCHA solution using get_task_result.
  • Displaying and Injecting the CAPTCHA Solution: Once the solution is retrieved, it is printed in a formatted JSON structure. The script then uses driver.execute_script to inject the solution into the webpage:

-For Geetest V3: It creates or updates hidden form fields (e.g., geetest_challenge, geetest_validate, and geetest_seccode) with the solution values and updates the content of the #embed-captcha element to reflect a success message.

-For Geetest V4: The script simply replaces the content of the #embed-captcha element with a notification confirming successful CAPTCHA resolution.

  • Delay and Cleanup: After the solution is injected, the script pauses for approximately 30 seconds to allow for visual verification of the results before closing the browser session gracefully. Additionally, there is a planned future test with SolveCaptcha to compare recognition capabilities. This design ensures that the CAPTCHA solving process is observable in real time, and a screen capture of the process is available for demonstration purpose4

Conclusion

This comprehensive analysis has dissected the inner mechanics of Geetest CAPTCHA—from its innovative dynamic image generation and interactive slider mechanism to its meticulous behavioral analysis. The robust, multi-layered design of Geetest makes it a formidable barrier for automation, yet, with a solid understanding of its parameters and careful implementation using Python (even with modest programming expertise), bypassing this security measure is achievable. However, every parameter and user interaction must be meticulously handled, as a single oversight can result in a prolonged struggle with an ever-evolving challenge. Armed with this guide and the power of a reliable solving service like 2Captcha, you are now equipped to develop a sophisticated CAPTCHA solver capable of outmaneuvering even the most resilient of defenses.

I want this job

回复

要查看或添加评论,请登录

2captcha的更多文章

社区洞察

其他会员也浏览了