Macros for simple website automation
This article is meant for beginners who want to learn Macro automation in the lock-down period. Macros are VB programs written within the Excel application that can help automate repetitive tasks.
To enable Development mode to write Macros. In Excel go to File-> Options-> Customize Ribbon-> Check “Developer” Tab
To trigger the VB Script using a button. In the Developer tab, Drag and drop the Command Button(ActiveX Control) using the Insert option.
Drag and drop the Command Button to the spreadsheet and double-click it to open the VB Editor.
The code between Private Sub CommandButton1_Click() and End Sub will be triggered on click of the Command Button. The file should be saved as a Macro-Enabled Workbook.
To open Internet Explorer through VB:
Dim objIE As Object Set objIE = CreateObject("InternetExplorer.Application") objIE.Visible = True
This Opens the a new IE window and assigns its control to the Object objIE. It is better to use
Set objIE As InternetExplorerMedium Set objIE=New InternetExplorerMedium
Instead of CreateObject("InternetExplorer.Application"). Using InternetExplorerMedium allows the VB Compiler to allocate memory in advance and make other important optimizations(early Binding).
To enable the InternetExplorerMedium object, select Microsoft Internet Controls from Tool -> References in the VB Editor. Also select Microsoft Excel 16.0 Object Library
Now, we will see how VB can be used to collect information about products from the e-commerce website MuFubu.com
To make the IE Window to go the website MuFuBu.com, we can use the command.
objIE.Navigate "https://mufubu.com/collections/backpacks?page=1"
To find out how many pages need to be visited, we can look at the page navigation bar.
To get the value of totalItems, we can use the DOM to spy the webpage using Chrome(F12)
Ctrl+F in the DOM page with the search value //span[@class=”count”] returns a unique value. So this path can be used to identify the navigation bar
We can find the text in the Navigation bar using the function getElementsByClassName
Set navigationTextObject = objIE.document.getElementsByClassName("count")(0)
The only element in the navigationTextObject is the navigation Bar. The text in the element in navigationTextObject will give “Showing Items **-** of **” . We can get the Total Items by getting the substring of the inner text after “of”. The Split function returns an Array with elements before and after the string "of". Cint typecasts the String to an Integer.
totalItems = CInt(Split(navigationTextObject.innerText, "of")(1))
If totalItems is a whole multiple of 20, the number of pages will be totalItems/20. Else it will be totalItems/20+1
If totalItems Mod 20 = 0 Then maxPage = totalItems / 20 Else maxPage = totalItems / 20 + 1 End If
By inspecting the list structure of the products displayed in the page, we can see that the product information is contained in a div object with class “product-info”. There are 20 items with the class “product-info” in the first page.
By comparing the details on the page with HTML, we can identify the corresponding elements.
To get the first object with class “product-info”, we use the command
Set productInfo = objIE.document.getElementsByClassName("product-info")(0)
From this productInfo html object, we can get the html link with detailed product details by
linkAddress = productInfo.getElementsByTagName("a")(0).href
Similarly, we can get
description = productInfo.getElementsByTagName("a")(0).innerText offeredPrice = productInfo.getElementsByClassName("price")(0).getElementsByClassName("onsale")(0) .getElementsByTagName("span")(0).innerText actualPrice = productInfo.getElementsByClassName("price")(0).getElementsByClassName("was")(0).getElementsByTagName("span")(0).innerText
We use a For Loop to iterate over the pages and an inner For Loop to iterate over each Item
There are 20 items on all pages except the last page. If the total number of items is a multiple of 20, the last page has 20 items. Otherwise, the number of items on the last page is the reminder when the total number of Items is divided by 20. Writing that as a For loop we have.
For pageNo = 1 To maxPage If pageNo = maxPage Then If totalItems Mod 20 = 0 Then maxItem = 20 Else maxItem = totalItems Mod 20 End If Else maxItem = 20 End If Next
After fetching the current page details, we need to navigate to the next page. We can do so by clicking on the next button.
The inspecting the page Html, we can see that the Next button is inside a pagination div with a unique id.
So we can visit the next page by using the line
objIE.document.getElementById("Pagination").getElementsByTagName("a")(2).Click
The next button directs to the url of the next page. Since the url is already contained in the page url, we can avoid clicking on the link and replace the page number in the url with the page number variable.
Therefore, instead of clicking on the link, we can use
pageURL = "https://mufubu.com/collections/backpacks?page=" objIE.Navigate pageURL&CStr(pageNo)
CStr is used to typecast pageNo as a String variable. To make the Vbscript wait for the page to load after navigating, we can use
Do Until objIE.readyState = 4: DoEvents: Loop
This waits until ready State =4, ie until the browser is ready to accept input.
To set the cell values of Current workbook(Which the Macro belongs to), use Application.ThisWorkbook. To set the headings, we can use,
rowNumber=1 With Application.ThisWorkbook.Worksheets("Sheet2") .Cells(rowNumber, 1).Value = "Sl No" .Cells(rowNumber, 2).Value = "Description" .Cells(rowNumber, 3).Value = "Offered Price" .Cells(rowNumber, 4).Value = "Actual Price" .Cells(rowNumber, 5).Value = "Url" End With
The With function helps us reduce the number of times we refer to the WorkBook and sheet. Inside the With- End With container, Application.ThisWorkbook.Worksheets("Sheet2"). Cells(rowNumber, 1).Value = "Sl No" can be replaced with .Cells(rowNumber, 1).Value = "Sl No"
To iterate through each item in the page and get the corresponding product information we can use an inner loop. By incrementing rowNumber every time we obtain product details, we can append the product details in a continuous list.
For itemNumber = 0 To (maxItem - 1) rowNumber = rowNumber + 1 Set productInfo = objIE.document.getElementsByClassName("product-info")(itemNumber) linkAddress = productInfo.getElementsByTagName("a")(0).href description = productInfo.getElementsByTagName("a")(0).innerText offeredPrice = productInfo.getElementsByClassName("price")(0).getElementsByClassName("onsale")(0).getElementsByTagName("span")(0).innerText actualPrice = productInfo.getElementsByClassName("price")(0).getElementsByClassName("was")(0).getElementsByTagName("span")(0).innerText With Application.ThisWorkbook.Worksheets("Sheet2") .Cells(rowNumber, 1).Value = rowNumber - 1 .Cells(rowNumber, 2).Value = description .Cells(rowNumber, 3).Value = offeredPrice .Cells(rowNumber, 4).Value = actualPrice .Cells(rowNumber, 5).Value = linkAddress End With Next
Clicking the execute button gives the list of 45 product details in Sheet2
Please find the entire Code below
Private Sub CommandButton1_Click() Dim objIE As InternetExplorerMedium Set objIE = Nothing Dim totalItems As Integer, maxPage As Integer, pageNo, itemNumber, linkAddress, offeredPrice, actualPrice, description, maxItem, rowNumber, pageURL, navigationTextObject, productInfo pageURL = "https://mufubu.com/collections/backpacks?page=" Set objIE = New InternetExplorerMedium objIE.Visible = True objIE.Navigate pageURL & CStr(1) Do Until objIE.readyState = 4: DoEvents: Loop Set navigationTextObject = objIE.document.getElementsByClassName("count")(0) totalItems = CInt(Split(navigationTextObject.innerText, "of")(1)) rowNumber = 1 With Application.ThisWorkbook.Worksheets("Sheet2") .Cells(rowNumber, 1).Value = "Sl No" .Cells(rowNumber, 2).Value = "Description" .Cells(rowNumber, 3).Value = "Offered Price" .Cells(rowNumber, 4).Value = "Actual Price" .Cells(rowNumber, 5).Value = "Url" End With If totalItems Mod 20 = 0 Then maxPage = totalItems / 20 Else maxPage = totalItems / 20 + 1 End If For pageNo = 1 To maxPage objIE.Navigate pageURL & CStr(pageNo) Do Until objIE.readyState = 4: DoEvents: Loop If pageNo = maxPage Then If totalItems Mod 20 = 0 Then maxItem = 20 Else maxItem = totalItems Mod 20 End If Else maxItem = 20 End If For itemNumber = 0 To (maxItem - 1) rowNumber = rowNumber + 1 Set productInfo = objIE.document.getElementsByClassName("product-info")(itemNumber) linkAddress = productInfo.getElementsByTagName("a")(0).href description = productInfo.getElementsByTagName("a")(0).innerText offeredPrice = productInfo.getElementsByClassName("price")(0).getElementsByClassName("onsale")(0).getElementsByTagName("span")(0).innerText actualPrice = productInfo.getElementsByClassName("price")(0).getElementsByClassName("was")(0).getElementsByTagName("span")(0).innerText With Application.ThisWorkbook.Worksheets("Sheet2") .Cells(rowNumber, 1).Value = rowNumber - 1 .Cells(rowNumber, 2).Value = description .Cells(rowNumber, 3).Value = offeredPrice .Cells(rowNumber, 4).Value = actualPrice .Cells(rowNumber, 5).Value = linkAddress End With Next Next objIE.Quit Set objIE = Nothing End Sub
Edit: The mufubu site is currently down as they are only delivering essentials(not because of too many people running VBA on their site). I have written the below code to fetch data from saucedemo.com (site meant for testing automation). You can test VBA on saucedemo.
Private Sub CommandButton1_Click() Dim objIE As InternetExplorerMedium Set objIE = Nothing Dim loginButtonObj, inventoryItems, nameElement, pageURL As String, rowNumber As Integer pageURL = "https://www.saucedemo.com/index.html" Dim objWSS Set objIE = New InternetExplorerMedium objIE.Visible = True objIE.Navigate pageURL Do Until objIE.readyState = 4: DoEvents: Loop Set objWSS = CreateObject("WScript.Shell") Application.Wait (Now + TimeValue("00:00:04")) Set nameElement = objIE.Document.getElementById("user-name") nameElement.Click nameElement.Focus objWSS.SendKeys "standard_user" objWSS.SendKeys "{TAB}" objWSS.SendKeys "secret_sauce" Application.Wait (Now + TimeValue("00:00:04")) 'objIE.Document.getElementById("user-name").innerText = "value" Set loginButtonObj = objIE.Document.getElementsByClassName("btn_action") For Each Elem In loginButtonObj If Elem.Value = "LOGIN" Then Elem.Click Do Until objIE.readyState = 4: DoEvents: Loop Application.Wait (Now + TimeValue("00:00:05")) End If Next Elem Set inventoryItems = objIE.Document.getElementsByClassName("inventory_item") rowNumber = 1 With Application.ThisWorkbook.Worksheets("Sheet2") .Cells(rowNumber, 1).Value = "Sl No" .Cells(rowNumber, 2).Value = "Item Name" .Cells(rowNumber, 3).Value = "Description" .Cells(rowNumber, 4).Value = "Price" rowNumber = rowNumber + 1 For Each Item In inventoryItems .Cells(rowNumber, 1).Value = rowNumber - 1 .Cells(rowNumber, 2).Value = Item.getElementsByClassName("inventory_item_name")(0).innerText .Cells(rowNumber, 3).Value = Item.getElementsByClassName("inventory_item_desc")(0).innerText .Cells(rowNumber, 4).Value = Item.getElementsByClassName("inventory_item_price")(0).innerText rowNumber = rowNumber + 1 Next Item End With End Sub
The login page at saucedemo.com had a peculiar property where the username and password set through getElementById("").Value and getElementById("").InnerText were cleared when clicking the login button. This might be due to javascript functions called onchange and onclick of input fields. So I entered the values as keystroker using sendKeys. It is otherwise not a good pratice to use sendkeys. getElementById("").fireEvent("onchange") might solve the problem and should be tried before going ahead with sendkeys.
In case of 462 Internet Explorer error, go to IE Settings -> Internet Options -> Security -> Uncheck Enable Protected Mode
Visionary AI Leader | Senior Manager | Expert in Generative AI, Automation, and Digital Transformation
4 年Amazing article... looking forward to seeing some more from you!