DATA SCRAPING

DATA SCRAPING

Data scraping, in its most general form, refers to a technique in which a computer program extracts data from output generated from another program. Data scraping is commonly manifest in web scraping, the process of using an application to extract valuable information from a website.

The process of web scraping is fairly simple, though the implementation can be complex. Web scraping occurs in 3 steps:

·??????? First the piece of code used to pull the information, which we call a scraper bot, sends an HTTP GET request to a specific website.


·??????? When the website responds, the scraper parses the HTML document for a specific pattern of data.

·??????? Once the data is extracted, it is converted into whatever specific format the scraper bot’s author designed.

Web Scraping


Data scraping is most often done either to interface to a legacy system, which has no other mechanism which is compatible with current hardware, or to interface to a third-party system which does not provide a more convenient API.

?In the second case, the operator of the third-party system will often see screen scraping as unwanted, due to reasons such as increased system load, the loss of advertisement revenue, or the loss of control of the information content.

Data scraping is generally considered an ad hoc, inelegant technique, often used only as a "last resort" when no other mechanism for data interchange is available. aside from the higher programming and processing overhead, output displays intended for human consumption often change structure frequently.

要查看或添加评论,请登录

Vydehi Shanmugavadivelu的更多文章

  • Building Blockchain Implementation with Java

    Building Blockchain Implementation with Java

    Blockchain technology has gained significant attention due to its potential for providing secure, transparent, and…

  • Unveiling the future -cutting edge innovation for computer science students

    Unveiling the future -cutting edge innovation for computer science students

    In the dynamic realm of Computer Science, the relentless pace of technological advancement continuously shapes the…

  • Important of practical experience for computer science students

    Important of practical experience for computer science students

    As computer science continues to evolve rapidly, the demand for skilled professionals in the field is at an all-time…

  • BINARY TREE IN AI AND DS

    BINARY TREE IN AI AND DS

    A binary tree is a data structure consisting of nodes, where each node has at most two children, referred to as left…

  • SNS Culture & Vision

    SNS Culture & Vision

    https://youtu.be/nHXkZa-0Xo8 In SNS, we strive to touch every leads of world we live in.

  • Web Generation

    Web Generation

    Web 1.0: The Static Web The first version of the internet is sometimes called the “static web.

  • NEOM

    NEOM

    As of my last knowledge update in January 2022, NEOM refers to a planned cross-border city and economic zone in the…

  • ETL

    ETL

    In the world of data warehousing, if you need to bring data from multiple different data sources into one, centralized…

  • COLAB

    COLAB

    · Colaboratory, or “Colab” for short, is a product from Google Research. · Colab allows anybody to write and execute…

  • SNOWFLAKE

    SNOWFLAKE

    Snowflake’s Data Cloud is powered by an advanced data platform provided as a self-managed service. Snowflake enables…

社区洞察

其他会员也浏览了