DATA SCRAPING
Vydehi Shanmugavadivelu
Professor and Department Head at Dr.S.N.S.Rajalakshmi College Of Arts & Science
Data scraping, in its most general form, refers to a technique in which a computer program extracts data from output generated from another program. Data scraping is commonly manifest in web scraping, the process of using an application to extract valuable information from a website.
The process of web scraping is fairly simple, though the implementation can be complex. Web scraping occurs in 3 steps:
·??????? First the piece of code used to pull the information, which we call a scraper bot, sends an HTTP GET request to a specific website.
·??????? When the website responds, the scraper parses the HTML document for a specific pattern of data.
·??????? Once the data is extracted, it is converted into whatever specific format the scraper bot’s author designed.
Data scraping is most often done either to interface to a legacy system, which has no other mechanism which is compatible with current hardware, or to interface to a third-party system which does not provide a more convenient API.
?In the second case, the operator of the third-party system will often see screen scraping as unwanted, due to reasons such as increased system load, the loss of advertisement revenue, or the loss of control of the information content.
Data scraping is generally considered an ad hoc, inelegant technique, often used only as a "last resort" when no other mechanism for data interchange is available. aside from the higher programming and processing overhead, output displays intended for human consumption often change structure frequently.