Use Laravel Dusk, browser automation and PHP to programmatically surf the web
Laravel Dusk is a powerful browser automation tool for Laravel. With Dusk you can programmatically test your own applications or visit any website on the internet using a real Chrome browser. Using Dusk you can automate repetitive tasks, scrape information from other sites or test to make sure your app always works in the browser. In this tutorial we’ll go through how to create a job, login to a mythical website and click around.
Create a new Laravel app:
$ laravel new dusk-scraper
$ composer require --dev laravel/dusk
$ php artisan dusk:install
Dusk scaffolding installed successfully.
In the tests/DuskTestCase.php file that Laravel generated you will have a call to startChromeDriver in the prepare function (below). The prepare function gets called before the Dusk test is executed. It’s an abstract class so probably not a good place for us to put our code. We can make a new fresh dusk test case that extends the DuskTestCase with an Artisan command:
$ php artisan dusk:make ScrapeTheWebTest
This file (ScrapeTheWeb.php) will appear in tests/Browser directory. You can run the test with another Artisan command:
$ php artisan dusk
Right now it does not do anything. Here is the code to login to a website and click some buttons:
<?php
namespace Tests\Browser;
use Tests\DuskTestCase;
use Laravel\Dusk\Browser;
use Illuminate\Foundation\Testing\DatabaseMigrations;
class ScrapeTheWebTest extends DuskTestCase
{
private $order_ids;
public function __construct($name = null, array $data = [], $dataName = '')
{
parent::__construct($name, $data, $dataName);
$this->user_ids = [
1,
2,
3,
];
}
/** @test */
public function loginAndClickButton()
{
$this->browse(function (Browser $browser) {
$browser->visit('https://website.com/login')
->type('input .usernameField', env('USERNAME'))
->type('input .passwordField', env('PASSWORD'))
->click('#login')
->waitForText('Orders');
@foreach($this->user_ids as $user_id)
{
$browser->visit('https://website.com/users/' . $user_id)
->waitForText('This is protected page')
->click('button .button-im-looking-4')
->waitForText('Page after the button')
->click('.another #button')
->pause(4000);
}
});
}
}
I’m using environment variables to store the values for username and password so in case they are sensitive you don’t have to check them in to version control. To find elements on the page use CSS selectors and browser devtools to target specific elements. We filter through some custom numbers and visit websites dynamically based on this data.
Your tests will run in the terminal with the php artisan dusk command. The fun really comes in when you see the browser perform the actions you specify. By default Laravel Dusk runs what’s called a headless browser that you won’t be able to watch. To watch the browser perform actions head to DuskTestCase.php that our ScrapeTheWebTest inherits from. Once there remove the --headless option:
/**
* Create the RemoteWebDriver instance.
*
* @return \Facebook\WebDriver\Remote\RemoteWebDriver
*/
protected function driver()
{
$options = (new ChromeOptions)->addArguments([
'--disable-gpu',
//'--headless'
]);
return RemoteWebDriver::create(
'https://localhost:9515', DesiredCapabilities::chrome()->setCapability(
ChromeOptions::CAPABILITY, $options
)
);
}
With the headless option removed you can run the tests and watch the browser perform the actions that you specified! From this command you can use the full power of Laravel to create database records, trigger jobs, update data or anything else you can think of.
For further reading check out How to ‘hack’ and win the May Mayhem blog contest. It’s a great article that goes through the process of programmatically creating Github accounts, storing information from the web in a database and even integrating with the Github API to find users credentials.
Paul Redmond has some great slides about scraping the web using Laravel Dusk and PHP.
This technology is very powerful. Don't be a jerk.
This post originally appeared on Medium????
Chief Technical Whisperer – Every software bug is a feature in disguise.
2 年You can also disable headless by adding DUSK_HEADLESS_DISABLED=true to your .env file.