ADF’s Identity Management System (IMS) Architecture
Applied Data Finance
ADF helps consumers improve their finances while enabling enterprises to leverage data analytics and compliance tools.
We have divided the core Identity Management System (IMS) in the ADF into two separate services.
?Core API Service
This is the core client-facing service of the IMS system, which handles all responsibilities except for the fuzzy matching and contact lookup logic. This service is hosted as a gRPC server, using protocol buffers as the data transfer format and supports a wide variety of API calls, including but not limited to:
We chose to use gRPC over a REST API for the following reasons:
However, there were some challenges with gRPC that we encountered:
Despite these challenges, the benefits outweighed the costs, and we have been using gRPC in production successfully for over a year now
?Identity Lookup Microservice
This microservice is designed to provide fast and accurate fuzzy matching and lookup of customer data by using a large in-memory index. It supports specialized data types and uses XML files to configure different algorithms for various use cases. Communication between this microservice and the client-facing IMS service is done using gRPC with well-defined contracts. This microservice operates in a read-only mode and only reads from the database to pre-populate its in-memory indices.
?Why in-memory lookup?
?The in-memory fuzzy matching system is crucial for the IMS system to handle the high volume (multi-million and growing) of customer data effectively and quickly. The complexity and real-time nature of fuzzy matching require specialized processing that traditional databases and NoSQL’s are not equipped to handle. This is why an in-memory index with fast retrieval is essential. The splitting of the lookup functionality into a separate microservice also allows for better isolation of resource requirements and scaling as needed.
?However, having a separate service also poses challenges such as potential downtime and the need for a fallback mechanism. The system has been designed to handle this with a fallback mechanism using exact matching and reconciliation processes to ensure continuity of service. The index-building process has also been optimized for speed to minimize downtime.
Client Wrappers
The IMS team provides the client systems with SDKs that are integrated via a drop-in mechanism. The SDKs are generated using gRPC and protocol buffers, which helps in faster data transfer and data object conversion. The IMS team also hosts these SDKs in internal repositories like Nexus or Archiva, and client teams can use them as a 3rd party library. This approach helps in reducing development time and gives the IMS team more control over different versions and features of the client SDKs.
领英推荐
Growth and Scaling Challenges
For high-volume customers with over a certain number of touch points, the IMS system precomputes and saves the entire dataset as JSON in a NoSQL store This allows for fast retrieval of derived and aggregated data, as the traditional RDBMS cannot handle the large-scale data and meet the SLA of a few 100ms response times. The writes to this NoSQL store are managed simultaneously with the RDBMS to keep it updated. This approach helps tackle the rapid growth of event data, which is expected to grow significantly over the years. e.g. Over the years, we've seen some customers with 1000+ touchpoints. In this case, the volume of data, including all addresses, emails, phone numbers, preferences, events, and so on, will be 50-100x greater than anticipated. Retrieving such data from a traditional RDBMS and computing the aggregated/derived data points in real-time will be difficult, and we will be unable to meet the SLA of a few hundred milliseconds response times. For large-scale datasets like ours, even creating enough indices in the database will not help.
Migration of legacy data
We started designing the IMS fairly a few years into our business. So, we already had legacy data from existing product lines/systems which were fairly large in number to be moved into the new IMS. For a rough idea, our IMS has been live for more than a year now, out of the ~50MM records we have around 37 MM records have been migrated from legacy systems. This is almost 70% of the volume. While designing the data model we had to liaise with all the different client teams, understand the needs, and come up with a generic data model.
Another challenge with legacy data is since each system (3 in this case) had its own way of storing customer identity information and behavior aka transactional data we have to think about designing a common data model to accommodate all types of information stored by all 3 systems and also think about how to default missing data which can be available in system A but not in system B. So, we decoupled the data export part from the client system with the data import part into the new IMS.
Each system A, B, and C will have its own exporter component which will export the data into a common data format i.e., XML/JSON. This data format will align with the IMS schema. A common importer component will parse this intermediate data format files and dump data into respective IMS data model tables. This decoupling also helps us with the flexibility of exporting once and importing multiple times to perform dry runs to ensure data integrity.
Another challenge we faced was that the legacy systems have evolved over years so data that is available for recent touch points may not be available for older data i.e., from 2015, etc since the functionality which needs that data might not be implemented at that time. We have to perform some analysis by writing custom utilities to find such corner cases and handle them with meaningful defaults. After 1 year we are still identifying some corner cases that could impact 0.001% of the data and retrofitting now. Verifying such massive data e.g., 100s of GBs of XMLs/JSONs generated and imported data integrity is not possible to do manually considering the scale of data here. So, our QA team had to write separate utilities to ensure exported data file integrity and imported data integrity which itself would run for hours to ensure data correctness post-data export and data import.
One more point to note in this migration is the export + import + data verification will take days if not weeks and we cannot keep the client systems in a shutdown state for so long when we do this data migration. So, we had to devise a mechanism for having the systems online and keep migrating data as they arrive. We designed the exporter and importer in a way to support delta datasets. So, we mark the data which we have exported, then later once the full migration and the IMS is up/live, then we rerun the exporter + importer + data verification for a smaller dataset i.e., delta records later during which we will keep the systems offline for few minutes to avoid missing any new incoming data.
Future of IMS
What we've done so far is only a fraction of what we'd like to do as an IMS for ADF. We wanted to evolve the system in the following ways to improve its capabilities.
Account Management
Currently, identity matching and serving is used only by the internal systems of ADF and used as an auxiliary functionality. Our goal is to make IMS an account management system where a customer who reaches ADF via any of the products or services will be identified and asked to log in with their own ADF account which can be used across all the product lines. This will ensure our crucial decisions are validated by the customers themselves and will enable better usage of the aggregated data to take crucial decisions like credit underwriting etc.
Enhanced customer profiles
The current system computes aggregated/derived variables from collected raw data in IMS, currently limited to a couple of hundred due to its online mode of computation. However, there is potential to expand this to 10s of 1000s of variables through an offline process. We want to expand the system to be an offline process, This will enable more complex computations and correlations across customer data to generate a deeper customer profile, resulting in better use of the data and improved decision-making.
SSO backend for all ADF products and services
This is an extension of the Account Management enhancement, since IMS will have and serve as a single source of truth for all customer identities this has the potential of becoming a backend service for SSO (Single Sign On) functionality which can be used by all customer-facing products as centralised service. This will also simplify the customer's account management process and enhance the overall customer experience.
Director of Product Management | Fintech
1 年Jayarampandian - Can the IMS as existing today be applied in other businesses with similar multi channel setups where unquiely identifying customers is challening?
Senior Software Engineer at BNP Paribas | CEG'16
1 年Excellent one.