Effortless Form Filling and Submission with Python: No Selenium Required

Effortless Form Filling and Submission with Python: No Selenium Required

1. Using requests and BeautifulSoup (for simple forms)

If the form submission is straightforward (e.g., no JavaScript or complex interactions), you can use the requests library to send HTTP POST requests and BeautifulSoup to parse HTML.

Example Code:

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Load data from Excel
excel_file = 'data.xlsx'  # Replace with your Excel file path
df = pd.read_excel(excel_file)

# Target URL
url = 'https://example.com/form'  # Replace with the target form URL

# Loop through each row in the Excel file
for index, row in df.iterrows():
    # Fetch the form page to get CSRF tokens or other hidden fields (if needed)
    session = requests.Session()
    response = session.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Extract hidden fields (if any)
    hidden_fields = {}
    for input_tag in soup.find_all('input', type='hidden'):
        hidden_fields[input_tag['name']] = input_tag['value']

    # Prepare form data
    form_data = {
        'first_name': row['First Name'],  # Replace with actual field names
        'last_name': row['Last Name'],    # Replace with actual field names
        'email': row['Email'],            # Replace with actual field names
        **hidden_fields  # Include hidden fields if necessary
    }

    # Submit the form
    submit_url = 'https://example.com/submit'  # Replace with the actual form action URL
    response = session.post(submit_url, data=form_data)

    # Check if the submission was successful
    if response.status_code == 200:
        print(f"Form submitted successfully for {row['First Name']} {row['Last Name']}")
    else:
        print(f"Failed to submit form for {row['First Name']} {row['Last Name']}")        

Explanation:

  • requests: Used to send HTTP GET and POST requests.
  • BeautifulSoup: Used to parse the HTML and extract hidden fields (e.g., CSRF tokens).
  • Form Data: The form data is constructed using the Excel data and any hidden fields required by the form.
  • Session: A requests.Session() is used to maintain cookies and session data.


2. Using mechanize (for older websites)

The mechanize library is another option for automating form submissions. It simulates a browser and can handle cookies, redirects, and form submissions.

Example Code:

import mechanize
import pandas as pd

# Load data from Excel
excel_file = 'data.xlsx'  # Replace with your Excel file path
df = pd.read_excel(excel_file)

# Initialize a browser object
br = mechanize.Browser()

# Loop through each row in the Excel file
for index, row in df.iterrows():
    # Open the form page
    br.open('https://example.com/form')  # Replace with the target form URL

    # Select the form (usually the first form on the page)
    br.select_form(nr=0)

    # Fill the form fields
    br['first_name'] = row['First Name']  # Replace with actual field names
    br['last_name'] = row['Last Name']    # Replace with actual field names
    br['email'] = row['Email']            # Replace with actual field names

    # Submit the form
    response = br.submit()

    # Check if the submission was successful
    if response.code == 200:
        print(f"Form submitted successfully for {row['First Name']} {row['Last Name']}")
    else:
        print(f"Failed to submit form for {row['First Name']} {row['Last Name']}")        

Explanation:

  • mechanize: Simulates a browser and can handle forms, cookies, and redirects.
  • Form Selection: The form is selected using br.select_form(nr=0) (assuming it's the first form on the page).
  • Form Submission: The form is filled and submitted using br.submit().


3. Using httpx (for modern websites with JavaScript)

If the website uses JavaScript to handle form submissions, you can use the httpx library, which supports asynchronous requests and can handle more complex scenarios.

Example Code:

import httpx
import pandas as pd

# Load data from Excel
excel_file = 'data.xlsx'  # Replace with your Excel file path
df = pd.read_excel(excel_file)

# Target URL
url = 'https://example.com/submit'  # Replace with the form action URL

# Loop through each row in the Excel file
async def submit_form():
    async with httpx.AsyncClient() as client:
        for index, row in df.iterrows():
            # Prepare form data
            form_data = {
                'first_name': row['First Name'],  # Replace with actual field names
                'last_name': row['Last Name'],    # Replace with actual field names
                'email': row['Email'],            # Replace with actual field names
            }

            # Submit the form
            response = await client.post(url, data=form_data)

            # Check if the submission was successful
            if response.status_code == 200:
                print(f"Form submitted successfully for {row['First Name']} {row['Last Name']}")
            else:
                print(f"Failed to submit form for {row['First Name']} {row['Last Name']}")

# Run the async function
import asyncio
asyncio.run(submit_form())        

Explanation:

  • httpx: A modern HTTP client that supports asynchronous requests.
  • Asynchronous Requests: Useful for handling multiple submissions concurrently.
  • Form Data: The form data is sent as a POST request to the form action URL.


4. Using Playwright (for complex websites with JavaScript)

If the website heavily relies on JavaScript, you can use Playwright, a more modern alternative to Selenium, which supports headless browser automation.

Example Code:

from playwright.sync_api import sync_playwright
import pandas as pd

# Load data from Excel
excel_file = 'data.xlsx'  # Replace with your Excel file path
df = pd.read_excel(excel_file)

# Initialize Playwright
with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)  # Run in headless mode
    page = browser.new_page()

    # Loop through each row in the Excel file
    for index, row in df.iterrows():
        # Navigate to the form page
        page.goto('https://example.com/form')  # Replace with the target form URL

        # Fill the form fields
        page.fill('#first_name', row['First Name'])  # Replace with actual field selectors
        page.fill('#last_name', row['Last Name'])    # Replace with actual field selectors
        page.fill('#email', row['Email'])            # Replace with actual field selectors

        # Submit the form
        page.click('#submit_button')  # Replace with actual submit button selector

        # Wait for the next form to load (adjust time as needed)
        page.wait_for_timeout(2000)

    # Close the browser
    browser.close()        

Explanation:

  • Playwright: A powerful tool for browser automation that supports modern web technologies.
  • Headless Mode: Runs the browser in the background without a GUI.
  • Form Filling: Uses page.fill() to fill form fields and page.click() to submit the form.


Which Method to Use?

  • Simple Forms: Use requests + BeautifulSoup.
  • Older Websites: Use mechanize.
  • Modern Websites with JavaScript: Use httpx or Playwright.


Don't miss out! ?? (Subscribe on LinkedIn https://www.dhirubhai.net/build-relation/newsletter-follow?entityUrn=7175221823222022144)

Follow me on LinkedIn: www.dhirubhai.net/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=bhargava-naik-banoth-393546170

Follow me on Medium: https://medium.com/@bhargavanaik24/subscribe

Follow me on Twitter : https://x.com/bhargava_naik

要查看或添加评论,请登录

Bhargava Naik Banoth的更多文章