Data acquisition strategies for AI-First Enterprise

Data acquisition strategies for AI-First Enterprise

Given the rapid advancements in AI and the extensive media coverage it has garnered, it is understandable that many organizations are feeling an increased urgency—sometimes even desperation—to deploy AI solutions to enhance their existing products and services or to develop new offerings. CXOs and senior business leaders are setting ambitious targets for their divisional and functional heads to leverage underutilized data repositories and implement AI use cases within their respective areas. While this directive from senior management is a commendable step toward cultivating an AI-First Enterprise culture, it is important to recognize that merely having a large volume of internal data does not automatically ensure the effective application of AI to resolve business challenges.?

Most ML business problems require multiple types of datasets to train algorithms effectively and achieve accurate predictions. For instance, a self-driving car relies on various datasets to determine its next action: video and image feeds from onboard cameras, radar signals reflecting off nearby objects, lidar data from 360-degree sensors, and third-party traffic information from sources like Google Maps. Therefore, once a business problem is clearly defined, the critical next step in the ML project plan is to identify the different datasets required, assess their availability within the organization, and devise a strategy to obtain/ acquire any missing data.

Based on my 12+ years of experience in building and scaling multiple AI solutions, a combination of build/ buy/ partner strategies is the most effective approach for acquiring the necessary datasets to tackle ML and AI challenges. To illustrate, consider a retail chain seeking to forecast demand and inventory levels by predicting customer purchases for the coming month. To gather the appropriate datasets for this task, the retail chain should implement three-pronged data acquisition strategies as detailed below:

  1. Build: This strategy involves extracting and curating relevant data from an organization’s internal database, thereby building out relevant datasets. Organizations often maintain large central data warehouses that aggregate extensive amounts of data across various business functions. However, not all of this data is directly applicable to the specific predictive problem at hand. As a result, substantial effort is required to sift through and transform the raw data to create a dataset that is both relevant and useful.? For instance, a retail chain may have accumulated millions of customer records over several years, including information on transactions, demographics, and purchasing behavior. To tackle a forecasting challenge, such as predicting future inventory needs or sales trends, the organization must focus on isolating and analyzing the most pertinent subsets of this data. This could involve filtering for recent customer transactions—perhaps from the past two to three years—to ensure that the dataset accurately reflects current buying patterns. Additionally, data transformation techniques such as normalization, aggregation, and enrichment might be necessary to clean and prepare the data for analysis.?
  2. Buy: To effectively solve certain ML problems and ensure accurate predictions, organizations may need access to datasets that are not available within their own internal databases. In such scenarios, it becomes necessary to acquire/ buy these datasets from external sources. For example, to improve future sales forecasts, a retail chain would benefit from acquiring several types of external data. This might include forecasted weather conditions, which could influence purchasing behavior, predictions of economic growth that affect consumer spending power, or recent health trends and disease outbreaks that impact the demand for health-related products. To obtain this data, the retail chain would need to engage with specialized providers such as weather forecasting companies for climate data, economic research institutions for growth projections, or health data firms for insights on disease outbreaks. Each of these external sources can provide valuable information that complements the internal datasets, leading to more accurate and comprehensive predictions. The process typically involves negotiating data access agreements and integrating these external datasets with the organization's existing data infrastructure to enhance overall predictive capabilities.
  3. Partner: Often, the datasets necessary to solve specific problems are held by external organizations that may be unwilling to sell this data due to its proprietary nature or strategic value. In such cases, a buy strategy may be ineffective. Instead, a partnership strategy can be a highly effective alternative, fostering mutually beneficial arrangements where both parties gain access to critical insights while safeguarding their proprietary information. For instance, in the context of the retail store inventory challenge, if a retail chain needs to forecast sales for a product also distributed by the original manufacturer through various channels, it would require access to the manufacturer’s sales data. However, the manufacturer may be hesitant to sell this data as it is central to their competitive strategy. In this scenario, a partnership could be the solution. The retail chain and the manufacturer can enter into a data-sharing agreement where they exchange relevant insights to enhance their respective operations. For example, the retail chain could provide detailed information on in-store customer behavior and sales trends, while the manufacturer could share data on broader distribution patterns and sales performance across different channels. This collaborative approach allows both organizations to build more accurate inventory prediction models and improve their business strategies without directly compromising their proprietary data. WIN-WIN

In conclusion, it is essential for machine learning project teams to establish robust data acquisition strategies for acquiring all relevant datasets before delving deeply into the project. A comprehensive and well-curated dataset directly influences the accuracy of predictive models and, consequently, the project ROI. Ensuring that the data is both holistic and pertinent enhances the reliability of the outcomes and maximizes the value derived from the project. Therefore, it is crucial to assess whether you have access to the necessary data and, if not, to devise a clear plan for obtaining it.

Mahavir Goyal

SDE-1 at Skeps | NodeJs | MERN Stack | Dockers | MYSQL | Mongo

2 个月

Nitish Kumar Do you believe there are untapped datasets, such as uncollected cash transaction data or unbilled transactions, that could hold valuable insights for businesses like retail chains? Additionally, what data collection strategies do fast-growing startups typically adopt to scale rapidly within a short time frame? Could you elaborate on these strategies in an article.

回复

要查看或添加评论,请登录

Nitish Kumar的更多文章

社区洞察

其他会员也浏览了