Web scraping and agile, how can web scraping help you to see the big picture?

Web scraping and agile, how can web scraping help you to see the big picture?

Today we live in an agile world. More and more companies are adopting agile methodologies to achieve their goals, leaving cascade projects behind. Don’t get me wrong; I love agile methodologies like Scrum or Lean Startup. Still, there are many times that instead of using pure agile, it would be best to use a mix, a hybrid model, to start with a cascade project until you can start using the sprints, but the companies are trying so hard to use agile, that they don’t think about hybrid models; instead, they try to achieve their goals using agile.

Agile is a great way to have a Minimum Viable Product, MVP, working as soon as it can. Still, while that product is excellent news for the customer, it is the worst that can happen to areas like Customer Services or Operations.

We can see this when a company moves from legacy software to a new one, develops new solutions or main changes in a working product, or starts a new project.

For example, we can take the e-commerce sales from a Supermarket. At that moment, I was working there as an efficient e-commerce manager. This Supermarket has more than 50 sales regions, and each of them has more than twenty delivery areas. The company was moving from a legacy back office to a new one to have better functions for its customers, a better understanding of every point of the process, better customer service, and a better operational experience, among other functions. When the project began, e-commerce was not significant, and the capacities weren’t an issue; on a typical day, the demand never exceeded the capacities per region.

One of the more important reports for an e-commerce Supermarket is the capacity of sales that it can provide for its customers. In this case, we let our customers schedule a timeframe for delivering their products, and we had a fixed capacity per supermarket and timeframe. The capacity is set to comply with the client's request; in other words, if the client wants us to deliver his order on a Monday at 17:00 because it is the only time he will be there, we have to achieve our commitment.

At that time, this Supermarket used an Excel spreadsheet report manually filled by an analyst to see the capacity for each timeframe and supermarket each morning. This report had the actual quantity of sales, the total capacity, and the capacity left, in this case, the data. One unit of capacity was equal to one client purchase. The analyst has to go through each sales region on the legacy software and copy the information to the Excel sheet before sending it by email. It was an expensive report, almost three hours of an analyst to copy-paste information that was old once it was delivered.

Suddenly the unexpected happened; the e-commerce sales grew ten times in two months.

We were not prepared!

At this point, all the cells were working in the core functionalities of the new BackOffice; there was no time to work in panels or give visibility, mainly because all that information lived in the old BackOffice, and there was nothing that the Supermarket could do for the next year to fix that.

There is a way that we can get information from a webpage and show it into a panel
Webscraping!

What is web scraping? Web scraping is the process of using bots, software, or scripts to extract content and data from a website. In other words, we can use software to go through the webpage, store the data that we need, and then present it as we want.

No alt text provided for this image

As we can see in the process, all we need to do is go to the webpage, log into it, get the cookie, go to the webpage that has the information, get the data, store it, and repeat. Sounds easy, and it is.

There are several ways to achieve this, depending on your expertise in developing software. If you are not familiar with any software language, don't worry, RPA is here to help you. Still, if you know how to code, you can easily scrap using Excel VBA, Python, Ruby on Rails, or any other language that you are familiar with. For this particular case, we chose Ruby on Rails.

Our main objective was to give visibility to the different levels of the organization about the number of clients that we can serve, which days, and at what time, so the manager of each Supermarket would have a unique view of his store. In contrast, the COO will have an aggregated view, of the big picture.

Once we got the data, we defined a layout that helps us quickly understand what's going on on an aggregate level and a specific one for each Supermarket. We implemented colors to have quicker visibility, which helps us know the status of the different timeframes of delivery for a specific supermarket, e.g., a window. This allowed us to show if we were offering sales for that day, if a window was closed before it reached its capacity, or if a window was oversold.

Below we can see an example of a way to show the data at an aggregate level, as well as a specific view for each Supermarket:

No alt text provided for this image

We developed a way to quickly understand with the number size the most important numbers that the operation team needed to see. For someones, the capacity offering that day was more important, but, in this case, how much was left was even more important. It's always necessary to validate all the ideas with the end-users. Don't tell them the idea, but show it to them so you can get their feedback and they can understand what you are talking about.

No alt text provided for this image

For this project, we needed about one week to get the data and build the first layouts and ways to present the data. We worked closely with the end-users and all the other users who needed the information to show them exactly what they needed. Operations teams normally don't have much time, so any panel they use must be as clean as possible and show only the information they need. Other areas need to see the big picture, so it is better to do a particular layout and keep the details for those who use them.

The steps to follow to achieve a good web scraping report are:

  1. Understand if the process is suitable for web scraping
  2. Define the data that you need
  3. Define the way the end-user needs the data (Excel, Google Sheets, PDF, email, webpage, etc.)
  4. Finally, define the periodicity: How many times a day do you need to update the data?

This was an evolving project, having incremental features every week based on the new needs, so try to be flexible. The end-users normally know a little better what they need, but don't be afraid to surprise them. If you have an idea, show it to them, Don't explain it, let them try it, and then get their feedback. Don't worry if they don't like it; you can iterate until you get what they need.

I hope you like it!

If you want to share thoughts, know more about this process and its implementation, or find ways to improve/transform a process, don't hesitate to contact me.

Have you ever heard about robotic process automation (RPA) for web scraping and how to use it? Don't miss my next article.

Tatiana Marshall

Strategic Sourcing Manager | Cross-functional problem solving | Out the box solutions

3 年

Very interesting article Sebastián! Waiting for the next one ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了