Data Driven Organization – Data Flow Architecture
Nishant Pradhan
Chief AI Officer | Mirae Asset | ex-Fidelity Investments | IIM Bangalore
Every organization wants to be data driven in today’s world but not everyone knows what it takes to be data driven. For a small start-up you can do everything required from a couple of DB’s but the moment your firm grows, you will realize that you need specific data layers for different purposes. Also at every stage of your data life cycle, your data needs will be different and hence the associated tools and tech will also be different. This article will help you understand the relevance of different data stages in any large organization & why you need to do it right.
1.???Data Capture
This is your Front end Application data which is the original source of capturing data into your organization. You could have a number of Front End journeys across different channels which are capturing data from your Customer during their journey. This will also include any data captured by your partners in their respective journey which they may be passing on to you for further processing at your end. In today’s digital world, there are various tools like GA, Firebase, Dynatrace, etc which capture your Customer’s click data and this belongs to this layer as well.
2.???Rule Engine
These are your middle office systems which process your business rules across your firm. For smaller firms, you may not need to have a separate database for rules but for larger organization, you need to ensure that the rules are standardized across all your front end Apps and for ease of maintenance & segregation of duties, you can store it in separate DB like a Blaze rule engine for example.
3.???Core system
This is your single source of truth for any information around your Customer as well as your Product. The data stored here is the Golden source for information which needs to flow to your Finance team for example for generating your P&L. Similarly, all Customer related operations work relies on this data and you can create a number of batch jobs to automate reports required by your Ops team. A smaller organization may have just one core system or even one platform comprising of the Data capture, rule engine & core system reqts. Larger organizations may have specific core systems for different products and a single Customer’s information may be scattered across different core systems.
领英推荐
4.???Landing Zone
This is the place where you pull data from various sources within & outside your firm into one common area. There are various tools to pull the required data from different source systems based on the volume of your data and the frequency of load. In some cases, you may need to pull the complete data from a source system on a daily basis while in some other cases you may need only few data fields from a source system on real time basis. In the case of external partners, vendors or subsidiaries, you’ll have to work with their IT & Data staff to create this data pipeline as they may also have the capability to push data to your landing zone at the required frequency. There may be complexities if some of your source systems internally or externally are on Cloud while others may not be on Cloud at the moment. Data Governance & control can be challenging when you have a lot of external source systems as you will need to implement governance process at both the source end as well as destination end.
5.???Staging area
After you have collected all the required data from different source systems within & outside your firm, you’ll need to clean this data and standardize the fields as different source systems may have been following different naming conventions for the same data fields. It is good to have a defined data taxonomy & nomenclature across the firm but you can’t enforce such a taxonomy on your partners/vendors/subsidiaries and hence there will be some data transformation at this stage. This clean data can now be pushed to your Data Science teams who may want to create predictive models based on historical data or your risk team who may want to create risk models based on historical data.
6.???Combine data
In this layer, you combine data across your firm along with the alternate data collected from external partners/vendors/subsidiaries to create value for your firm. One approach is to create a star schema based dimensional models for your different use cases / products across your firms. You need to identify your Fact tables and Dimension tables appropriately to ensure that your data is well managed. This is the layer where every firm can innovate from its peers and if you design this right, your analytics team would be able to generate insights easily. This will also help to streamline your reporting across your organization.
7.????Data Mart
This is your final data layer where you create reporting cubes from which aggregated reports could be generated. These reports could be an aggregation of all metrics for a specific product or an aggregated metric layer of all products in your firm or aggregated metrics across durations such as monthly/ quarterly/ annual. If you design this layer well, you can generate your Business MIS in an automated manner as well as your compliance reports that need to be submitted on a regular basis. You’ll also save a lot of man hours of your analytics and reporting people if you invest in creating proper data marts as one time exercise.
One important thing to keep in mind is to create a Data Catalog which will help you identify the right tables for your analytics and reporting purpose. There are various tools available to help you create and manage the same. This will also help you rationalize your data and tables and avoid duplicity. There are tools to help you understand and manage your data lineage and tools to manage your data quality at every stage. The last but not the least – you can leverage data observability tools to understand, diagnose & manage data health across multiple systems throughout the data lifecycle.