Is data a commodity ? The price of data

Is data a commodity ? The price of data

An answer to Web Data Has Become a Commodity and Needs a Marketplace to Grow

The mentioned article presents an interesting illustration but comparing the services of 100 web scrapers who ostensibly deliver the same output at a similar cost may not fully capture the realities of the web scraping industry. In reality, requesting quotes for a specific scraping task from various service providers, such as freelancers on Fiverr or established web scraping companies like Bright data, would likely result in a wide range of pricing. Please notice 2 points :


1 ) QUOTES that deviate significantly, sometimes up to 5 times higher or lower than the average, are quite common. This disparity reflects the inherent complexities and individual nuances involved in the web scraping field, implying that the concept of web data as a commodity may not be as simple as originally proposed.


2) The variability in the OUTPUT FORMAT caused by each developer's individual decisions and preferences is a significant factor that should be consider. Given the nature of web scraping, two developers scraping the same web page are unlikely to provide identical datasets.


QUOTES

One critical factor is the disparity in developer knowledge and experience. Web scraping is not a one-size-fits-all task; it can range from simple static webpage scraping to complex dynamic sites that necessitate advanced knowledge. Developers with more knowledge and experience are more likely to charge a higher fee for their services because they can handle more complex tasks and deliver higher-quality results.


Furthermore, billing methodologies used by developers can have a significant impact on pricing. Some may charge by the hour, others by the project, and still others may use a value-based pricing model. These various methodologies reflect the variety of web scraping services available and can result in varying costs even for identical tasks.


By taking into account these elements—developers' knowledge and experience, as well as their billing methodologies—your argument about web data as a commodity may gain depth and nuance.While your theory that web data can be treated as a commodity is intriguing, the reality of the web scraping industry has some nuances to consider. The variability in the output format caused by each developer's individual decisions and preferences is a significant factor that was not adequately addressed. Given the nature of web scraping, two developers scraping the same web page are unlikely to provide identical datasets.


OUTPUT FORMAT VARIABILITY is caused by both technical and non-technical factors. The scraping tools and methodologies used by a developer, for example, can cause differences in the output data. Furthermore, decisions made during the data cleaning and preprocessing stages (such as dealing with missing data or duplicate entries) can have a significant impact on the final dataset.


Customers also have a significant impact on the data output. Each customer may have specific needs, and custom datasets tailored to these needs are often more valuable than generic ones. This customization, which entails tailoring data collection to specific nuances as requested by a client, can result in drastically different datasets even from the same webpage.


Example

Consider the case of e-commerce data scraping. Depending on the needs of the specific client, two developers tasked with scraping an e-commerce site may end up with different datasets. One client may only require pricing and product descriptions for a specific category of items, whereas another may require a more comprehensive dataset that includes user reviews and ratings, product images, seller information, and so on. This demonstrates the highly customized and client-specific nature of web scraping tasks, casting doubt on the notion that web data is a commoditized, one-size-fits-all good.

As a result, while the argument about web data as a commodity is intriguing, the complexities of the web scraping industry may call into question the law of one price for a same dataset.

A marketplace like databoutique.com would definitely have an interesting role in the market.


DAVID MARTIN. CEO of icebergdata.co

Andrea Squatrito

Building Data Markets | CEO @ DataBoutique.com

1 年

Thank you, for the spot-on reply, I will post here as well my thoughts as on the original post. That is a very good point you are making. What your analysis shows is that, as of today, web scraped data is not treated like a commodity, and I agree.? We argue that it should. On Prices: a high variance of prices for the same scraping services are mainly caused by individual negotiations between buyers and sellers. The root cause is the lack of a marketplace where these data are negotiated. These marketplaces exist to benefit the buyer (to have a clearer and fair market price) and indirectly, the sellers, as a good functioning market attracts more buyers and increases volumes. To be able to do so, we need to address your second point: The variance of output formats. In commodity markets, to facilitate the clarity of exchange between buyers and sellers, standard definitions need to be in place. Apart from the mere technical output format (JSON, txt, CSV, etc.), we would also need a standard field structure and clearly defined information to be collected and method.? [...]

要查看或添加评论,请登录

Icebergdata的更多文章

社区洞察

其他会员也浏览了