登录查看更多内容

Automating Web Performance - Google Page Speed Insights API

Syed Rehan Ahmed

Manager SQA | Test Automation | Test Architect | OTT | Fintech | Ride Hailing | MS Project Management | ISTQB-CTFL? Certified

发布日期: 2024年8月30日

Website Performance

One of the key factor for web performance is the how long it take to load page, reder resources, how long an image on the webpage is taking time to load, and theres one tool which help assist these and many other performance points, which is Lighthouse. Google also provide a medium to run this Lighthouse performance through Google PageSpeed insight, to use this tool you just need to go to the URL,

https://pagespeed.web.dev/

And enter the website you need to check the results of, after running through the analysis and gathering information you need to make sure if there is any impact of the new development on the page speed of the web app, if so need to report and get it fixed before deployment or in case of production after deployment.There are some metrics which help us understand the overall performance of the application, FCP, LCP, etc.

Assessing the quality of experiences, PSI classifies the quality of user experiences into three buckets: Good, Needs Improvement, or Poor. PSI sets the following thresholds in alignment with the?Web Vitals?initiative:

Understanding PageSpeed Insight API

The same that we get from the page speed website can be achieved with PageSpeed Insight API as well, Google provides direct access to the page speed insight database and we can get the same results in the page speed insight’s API response as well, it will be more efficient and fast. There are different options that can be used to exactly get the required response from this API by adding the appropriate/required parameters. The basic API which Google provides is as follows,

https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=https://www.dhirubhai.net

Automating the website performance with PageSpeed Insight API using Data-Driven Approach

Getting the leverage of this PageSpeed API we can create an automation script that can run and fetch the desired results, lets deep dive into a simple Python script with data-driven approach, so it could be use for number of webpages, as with some product there could be hundreds of landing pages to access the website, or to cater specific region or strategic partner, there this data-driven approach helps alot. Also there could be a case where we need to test the web performance in different languages, there this data driven approach could be a very useful.

So I have developed a script that takes the URL one by one, from a data warehouse/data source and run the page speed diagnostics on them and save the key matrices along with performance indicators in a result file. So, save a lot of time to individually go to separate pages and verify everything manually,

Python Script Components

input_urls_file: this would be your file where URLs of the webpages present, one URL in each line
output_csv: this is the result file that would be generated in the same folder
urls_list: extracted URLs store in this one
current_timestamp: to see the execution time in the report
csv_header: thats the columns where key metrices will be present
pagespeed_api_url: thats the API url with the recommended structure
api_response_json: output response of the API would be saved here
lighthouse_metrics: all the metrics results will be hold in this one
base_url: to define the base URL
csv_row: to add define the rows of csv

import requests
import datetime

# Open the file containing the list of URLs to query against the PageSpeed API
with open('input-urls.txt') as input_urls_file:
    output_csv = 'pagespeed-results.csv'

    # Open the CSV file to write the PageSpeed results
    with open(output_csv, 'w') as results_file:
        # Read each line from the input URLs file and strip any extra spaces or newline characters
        urls_list = [line.strip() for line in input_urls_file]

        # Get the current date and time, which will be included in the results
        current_timestamp = datetime.datetime.now()

        # Write the header row to the CSV file
        csv_header = (
            'URL, First Contentful Paint, First Interactive, Total Blocking Time, Speed Index, '
            'Largest Contentful Paint, Cumulative Layout Shift, Performance, Accessibility, '
            'Best Practices, SEO, Date\n'
        )
        results_file.write(csv_header)

        # Iterate over each URL in the list
        for url in urls_list:
            # Construct the PageSpeed API URL with the necessary parameters
            pagespeed_api_url = (
                f'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={url}'
                '&strategy=mobile&category=PERFORMANCE&category=ACCESSIBILITY&category=BEST_PRACTICES&category=SEO'
            )
            print(f'Requesting {pagespeed_api_url}...')  # Log the API request URL
            response = requests.get(pagespeed_api_url)  # Send the GET request to the API
            api_response_json = response.json()  # Parse the JSON response from the API

            try:
                # Extract the results from the response
                lighthouse_metrics = api_response_json['lighthouseResult']
                
                # Extract the base URL from the full URL (excluding any query parameters)
                base_url = api_response_json['id'].split('?')[0]
                
                # Retrieve the relevant metrics from the JSON response
                metrics = {
                    "First Contentful Paint": lighthouse_metrics['audits']['first-contentful-paint']['displayValue'],
                    "First Interactive": lighthouse_metrics['audits']['interactive']['displayValue'],
                    "Total Blocking Time": lighthouse_metrics['audits']['total-blocking-time']['displayValue'],
                    "Speed Index": lighthouse_metrics['audits']['speed-index']['displayValue'],
                    "Largest Contentful Paint": lighthouse_metrics['audits']['largest-contentful-paint']['displayValue'],
                    "Cumulative Layout Shift": lighthouse_metrics['audits']['cumulative-layout-shift']['displayValue'],
                    "Performance": lighthouse_metrics['categories']['performance']['score'] * 100,
                    "Accessibility": lighthouse_metrics['categories']['accessibility']['score'] * 100,
                    "Best Practices": lighthouse_metrics['categories']['best-practices']['score'] * 100,
                    "SEO": lighthouse_metrics['categories']['seo']['score'] * 100,
                }

                # Construct a CSV row from the extracted metrics
                csv_row = (
                    f'{base_url},{metrics["First Contentful Paint"]},{metrics["First Interactive"]},'
                    f'{metrics["Total Blocking Time"]},{metrics["Speed Index"]},{metrics["Largest Contentful Paint"]},'
                    f'{metrics["Cumulative Layout Shift"]},{metrics["Performance"]},{metrics["Accessibility"]},'
                    f'{metrics["Best Practices"]},{metrics["SEO"]},{current_timestamp}\n'
                )
                results_file.write(csv_row)  # Write the row to the CSV file

                # Log the metrics to the console
                for key, value in metrics.items():
                    print(f'{key}: {value}')
                print(f'Date: {current_timestamp}\n')

            except KeyError as e:
                # Handle missing keys in the JSON response
                print(f'<KeyError> Missing key in response for {url}: {e}')
                results_file.write(f'<KeyError> Missing key in response ~ {url}.\n')
            except Exception as e:
                # Handle any other unexpected errors
                print(f'<Error> An unexpected error occurred for {url}: {e}')
                results_file.write(f'<Error> Unexpected error ~ {url}.\n')

Pre-Requisites

You need to have Python to be installed in your system to execute this script, as this script is written in Python, along with the required packages, you can download Python from here

领英推荐

10 BEST Web Scraping Tools

Guru99.com 1 年前

Monitor Your Web Data Like a Pro with this Open-Source…

Zyte 1 年前

How to use Ruby for Subcatchments Statistics using the…

Robert Dickinson 10 个月前

pip install requests

You need to prepare your URLs file and copy paste the above script in python file, can name it as pagespeed_analysis.py

How to use it?

You need to clone the source code locally, in some directory
Open the terminal in the same directory, and go to the folder where Python file resides like,

$cd C:\Users\<username>\Documents\page_speed_optimization\pagespeed\script

There is a file named pagespeed_analysis.py.py where the actual code resides so you need to execute this command,

python3 pagespeed_analysis.py

There is another data source file where all the URLs reside, so if you need to add any URL you can access this file and update/add/delete the required URL

Parameterisation

We usually need to verify the effect of the current changes on different environments, so the script is parameterised based on the environment, and it is mandatory to have the environment mentioned to run the script on that specific environment only, for that few changes in the code would be required, and so the structure of the command to run will start taking environment as input, changes in the code could be like this, you need to import argparse along with other dependencies and need to parameters it based on environment provided,

import requests
import datetime
import argparse

def run_pagespeed_analysis(environment):
    # Determine the input file based on the environment parameter
    if environment == 'staging':
        input_file = 'staging-urls.txt'
    elif environment == 'production':
        input_file = 'production-urls.txt'
    else:
        raise ValueError("Invalid environment specified. Use 'staging' or 'production'.")

    # Open the file containing the list of URLs to query against the PageSpeed API
    with open(input_file) as input_urls_file:
        output_csv = f'pagespeed-results-{environment}.csv'

Other part of the script would remain the same, the command to run it based on environment could be like this,

#Staging/test environment
python3 pagespeed_analysis.py staging

#Production environment
python3 pagespeed_analysis.py production

Command line utility makes it more convenient to integrate with CI, CD, cronjob, etc.

The script will create a CSV file named pagespeed-results.csv in the same directory. This file will contain the PageSpeed analysis results for each URL you provided.

Reporting

You can define an email group which after completion of the script send the detailed report to that group and if needed any action based on that execution of script can be done promptly.

P.S. Love to learn from others if this helped them or how we can further improve it :)

Akram Malik

Recruiter In Khabri4U

6 个月

Can you increase my website performance?

要查看或添加评论，请登录

Syed Rehan Ahmed的更多文章

Android Applications Testing - Helpful Tools

2024年8月21日

Android Applications Testing - Helpful Tools

For the Mobile Apps Testers, there are areas where we need to go beyond the functional testing part, as its just one…

2 条评论
Test Driven Development & Role of QA

2024年8月13日

Test Driven Development & Role of QA

Software development is an evolving field, theres always new trends and technology enhancements with every new day, in…

1 条评论
Testing Asynchronous Systems

2022年2月13日

Testing Asynchronous Systems

Learning how async systems work and how to get them tested, I came across a Java Library (framework) that makes it…

2 条评论
Android TDD and Roboelectric

2022年1月19日

Android TDD and Roboelectric

I have been learning about Android development and test-driven development lately, came across the Roboelectric…

1 条评论

Automating Web Performance - Google Page Speed Insights API

Syed Rehan Ahmed

Manager SQA | Test Automation | Test Architect | OTT | Fintech | Ride Hailing | MS Project Management | ISTQB-CTFL? Certified

Website Performance

Understanding PageSpeed Insight API

Automating the website performance with PageSpeed Insight API using Data-Driven Approach

Pre-Requisites

领英推荐

How to use it?

Parameterisation

Reporting

Syed Rehan Ahmed的更多文章

社区洞察

其他会员也浏览了

Real-World Web Scraping Success Stories

Master Web Scraping in Google Sheets: No Code Required! ??

How To Perform Keyword and Landing Page Analysis Using Python

?? Unlock the Power of Web Scraping & Automation with Apify

Best Web Scraping Tools in 2023

Elevating Web Development with R: Shiny and Beyond

Developer Insights

Managing state in React: React Context vs. Redux vs. Event Bus

My Custom GPTs

Implement Pipes in NestJS: Complete Guide with Examples and Benefits

Website Performance

Understanding PageSpeed Insight API

Automating the website performance with PageSpeed Insight API using Data-Driven Approach

Pre-Requisites

领英推荐

How to use it?

Parameterisation

Reporting

Syed Rehan Ahmed的更多文章

Android Applications Testing - Helpful Tools

Test Driven Development & Role of QA

Testing Asynchronous Systems

Android TDD and Roboelectric

社区洞察

其他会员也浏览了

Real-World Web Scraping Success Stories

Master Web Scraping in Google Sheets: No Code Required! ??

How To Perform Keyword and Landing Page Analysis Using Python

?? Unlock the Power of Web Scraping & Automation with Apify

Best Web Scraping Tools in 2023

Elevating Web Development with R: Shiny and Beyond

Developer Insights

Managing state in React: React Context vs. Redux vs. Event Bus

My Custom GPTs

Implement Pipes in NestJS: Complete Guide with Examples and Benefits