Web Scraping Using Selenium Java
Web Scraping is the process of extracting data from web pages. Also known as web scraping, this process can be used to extract text, photos, links, data from databases and other information from web pages. This information can then be stored to be used for various purposes (marketing, etc.). We will do this study on a target site here on qaavenue. Our goal here is to get a list of authors and how many authors have information on the site. For this, I start to examine the author names in the DOM.
As you can see in the picture, the author name is located under <a href=>. Since I will use xpath for this, I find the parent of the linker.
//div[@class='available-content']//p//a
Now I can access all the links from here. However, when I examine the links, I see that there is some unnecessary information other than the author name.
I will create a list to exclude them;
Then I will collect the remaining author names in a list called “authorNames”;
However, except for the first page, other pages proceed as “issueNo”. For this, I will go through the pages with a loop between 2 and 20 and I consider the first page as a single page.
After the first page arrives, I need to handle the notification for subscription;
We are now ready to do a full run, for which I will provide the full code and we will get some information about the author names we have obtained.
As can be seen Total number of unique authors: 209 and Maaret Pyh?j?rvi: 3 times Gleb Bahmutov: 3 times as the most published authors. Here is the project github link;
I hope this was useful.