Setting Up Strategic Data Platform for a BFSI Enterprise

Setting Up Strategic Data Platform for a BFSI Enterprise

Data for an BFSI Enterprise can facilitate meeting customers’ rising expectations and beat competitive threats in the AI-powered digital era, the AI-first bank will offer propositions and experiences that are intelligent (that is, recommending actions, anticipating and automating key decisions or tasks), personalized (that is, relevant and timely, and based on a detailed understanding of customers’ past behavior and context), and truly omnichannel (seamlessly spanning the physical and online contexts across multiple devices, and delivering a consistent experience) and that blend banking capabilities with relevant products and services beyond banking.

  • To counter a shrinking customer base, banks need to turn to machine-learning algorithms that predict which currently active customers are likely to reduce their business with the bank. This new understanding gives rise to a targeted campaign to reduce the churn.
  • Use machine learning to study the discount offering to customers and detecting the patterns of unnecessary discounts/offers that could easily be corrected. Early adoption of these changes can increase the revenues within a few months.
  • Improve products per customer ratio. It used advanced analytics to explore several sets of big data: customer demographics and key characteristics, products held, credit-card statements, transaction and point-of-sale data, online and mobile transfers and payments, and credit-bureau data. Refine microsegments in the customer base, then build a next-product-to-buy model can increase the likelihood to buy multiple times than without the power of data analytics.

Readiness for the AI Enabled World

The bank’s data management must ensure data liquidity—that is, the ability to access, ingest, and manipulate the data that serve as the foundation for all insights and decisions generated in the decision-making layer. Data liquidity increases with the removal of functional silos and allows multiple divisions to operate off the same data, with increased coordination. The data value chain begins with seamless sourcing of data from all relevant internal systems and external platforms. This includes ingesting data into a lake, cleaning and labeling the data required for diverse use cases (e.g., regulatory reporting, business intelligence at scale, AA/ML diagnostics), segregating incoming data (from both existing and prospective customers) to be made available for immediate analysis from data to be cleaned and labeled for future analysis.

Trends behind the change from data warehouses to data platforms

Data warehouses have, for the most part, stood the test of time and are still used in almost all enterprises. But several recent trends have made their shortcomings painfully obvious. The explosion in popularity of software as a service (SaaS) has resulted in a big increase in the variety and number of sources of data being collected. SaaS and other systems produce a variety of data types beyond the structured data found in traditional data warehouses, including semi-structured and unstructured data.

In traditional data warehouses have storage and compute are not separate, meaning that while the same hardware can be used for both storage and compute, it can only be deployed effectively in a static ratio. This limits its flexibility and cost-effectiveness

Another and arguably more significant trend, however, is the change of application architecture from monolithic to microservices. Since in the microservices world there is no central operation database from which to pull data, collecting messages from these microservices becomes one of the most important analytics tasks. To keep up with these changes, a traditional data warehouse requires rapid, expensive, and ongoing investments in hardware and software upgrades.

A data warehouse alone just won’t deliver on the growth in data variety, volume, and velocity being experienced today, and how combining a data lake with a data warehouse to create a data platform can address the challenges associated with today’s data: variety, volume, and velocity.

Data Platform

A data platform is a software solution for collecting, processing, managing, and sharing data for strategic business purposes. The data platform grows to support all these new production scenarios, converting ad hoc processing into automated workflows and applying best practices. At this scale, certain patterns emerge. Data is ingested into the system and persisted in a storage layer. Processing aggregates and reshapes the data to enable analytics and ML scenarios. Orchestration and governance are cross-cutting concerns that cover all the components of the platform. Once processed, data is distributed to other downstream systems. All components are tracked by and deployed from source control.

Figure:1: Anatomy of Data Platform

Building blocks of a cloud data platform

The purpose of a data platform is to ingest, store, process, and make data available for analysis no matter which type of data comes in—and in the most cost-efficient manner possible. To achieve this, well-designed data platforms use a loosely coupled architecture where each layer is responsible for a specific function and interacts with other layers via their well-defined APIs. The foundational building blocks of a data platform are ingestion, storage, processing, and serving layers

Figure 2: Data Platform on Cloud

Running a data platform at scale comes with a unique set of challenges to consider and address. Data science deals with writing queries and developing ML models. Data engineering takes these and scales them to millions of rows of data, provides automation and monitoring, ensures security and compliance, and so on.

The core services of a data platform include storage and analytics services, automatic deployment and monitoring, and an orchestration solution.


Figure 3: Layered Architecture of Data Platform

Data is ingested into the system from multiple sources. Data flows into and out of the platform, and various workflows are executed. All of this needs an orchestration layer to keep things running. Storage is the backbone of any data platform.

Key Workloads for Data Platform

Three main workloads that a data platform must support are

  • Processing—Encompasses aggregating and reshaping the data, standardizing schema, and any other processing of the raw input data. This makes the data easier to consume by the other two main processes: analytics and machine learning.
  • Analytics—Covers all data analysis and reporting, thereby deriving knowledge and insights on the data.
  • Machine learning—Includes all ML models training on the data.

Governance

Governance is the process of managing the availability, usability, integrity, regulatory compliance, and security of the data in a data system. Effective data governance ensures that data is consistent and trustworthy and doesn't get misused.

  • Metadata— Managing metadata for understanding data. Cataloguing and inventorying the data, tracking lineage, definitions, and documentation are the key subject of delivery. Physical metadata describing the structure of the source system, such as table and column names. Logical metadata describing semantic information such as database descriptions, data quality expectations, and associated data policies.
  • Data quality—How to test data and assess its quality ? The degree to which data is accurate, complete, timely, consistent with all business requirements, and ready for a given use, that is, is trusted.

Data quality and behavior are not constant. It varies over time for each variable. It also varies across your customer or product portfolio. Its continuous exercise. Data quality checks and fix mechanisms and analytics/machine learning models can benefit a lot from exactly how human experts behave under different data-quality scenarios

  • Compliance—Honoring compliance requirements like the Digital Personal Data Protection Act (DPDP), handling sensitive data, and controlling access/access management.

DevSecOps Setup for Data Platform

DevOps means for data to build reproducible systems reliably with source control and deploying everything automatically.

Figure 4: DevSecOps Architecture for Data Platform

Data Engineering

Figure 5: Data Engineering Lifecycle Components

A cloud data platform is the foundation for all stages of analytics maturity, and delivering quality data so that users trust the data they are accessing. A desire to get insights from data is often the first step in an analytics maturity journey. These insights are driven by self-service analytics, where the business is empowered to use their tools of choice to access any and all data they need for exploration - Bring Your Own Analytics (or BYOA).

Hemant Bharati

Account Executive - Cloud & AI Architecture

1 个月

Very well articulated Sir ..

要查看或添加评论,请登录

社区洞察

其他会员也浏览了