Driving sustainable growth in banks by connecting customer data using a graph database
Harry Powell
Data science leader with track record of innovation and value creation
Growing a banking business requires you to make good decisions at each stage of the value cycle from acquiring new customers, to growing sales, increasing margins, reducing overhead, managing risk and reducing attrition. All of those decisions are driven, fundamentally, by how well you know your customers. But data about your customers is likely to be spread across many systems. So to truly understand your customer you have to connect together disparate snapshots of data from transactions across products, channels and communities. Each snapshot on its own is too little; Only when connected can they give enough detail for you to understand how you can deliver the most value to your customers, and how in turn they can help drive sustainable growth in your business profitability.
This note looks at each stage of the banking value cycle, and how connecting customer information can be used to grow banking business. It explains why achieving this in practice is hard using existing data systems and shows how graph databases can join your data together easily so that every decision you make is done with the context to maximise value.
To grow your business, you’ll want to acquire new customers. But by definition you don’t know much about people who don’t yet do business with you. So who do you want as your customer, and what are you going to offer to them to change banks? You need to combine the small amount of external data you have about the prospect with the rich data you have about your existing customer base. From that you can draw conclusions about the lifetime value offered by onboarding this person’s business, and what incentive you can afford to offer them to join. But this is hard because external information will have a very different structure to your internal information. For example, bureau data may well be at a different level of aggregation due to privacy concerns. It won’t be a simple match.
2. Sell more
Next, you want to sell more to your existing customer base. To do so you must curate a set of products and offers that each customer will want and need. But although banking customers undertake a lot of current account transactions, each individual customer will take only a small number of banking products over their lifetime, so you won’t know enough about what they want from what they have already done. Moreover the most crucial product to sell them is the first, and you don’t have much information about that at all. To target efficiently you need to combine two methodologies, content filtering and collaborative filtering, and to do that you have to combine product information with behavioural information. Unfortunately the former will be structured around products and the latter around transactions, with schemas that will be difficult to match up.
3. Increase Margin
Increasing margin from sales can be done by shifting customers onto higher profit products and services. For example, some customers are willing to pay for flexibility; If they have a wide range of activities and investments, they may want an account that can give them credit for one illiquid investment when trying to raise short term funds for another. But you can only offer this kind of product to people who have the right balance of investments, loans and businesses. Historically data systems for each product have been kept apart, often with completely different identification protocols and information needs. There is often no shared ID mapping that allows them to be connected together easily.
4. Reduce Overheads
Banks have huge fixed costs related to the essential services that customers require. But a disproportionate cost comes from maintaining the legacy systems that power services that hardly anyone uses any more. Significant savings can be driven by migrating customers onto core services, so you can shut down redundant kit, and that can often be done without inconveniencing anyone. But doing this requires great care; important customers may rely upon services that no one else uses anymore. You need to understand who is using what, and why, often patching together sparsely populated datasets from a multiplicity of sources.
5. Manage Risk
Your customers are not just a source of profit. They are also your key source of risk; through default, through fraud and through regulatory compliance. This risk translates into losses and fines if unmitigated. Data is the bank’s best tool to manage risk. Although bad actors are adept at covering their tracks, they always leave some trace. But to find residual evidence of illegal activity? you need to triangulate a wide range of data points and you need to search deep into your data. Unfortunately your data systems were not set up to do this, so while you can identify them in theory, in practice they are still there, costing your bank millions.
6. Reduce Attrition
It's much harder to acquire a new customer than to keep an existing one. If you can provide a great service, most customers will never look elsewhere, even if they could save money doing so. But customer expectations are growing: they expect you to know all of their interactions, across all related activities and people, and all banking channels. Not joining the dots is a major source of dissatisfaction to customers. For example, a premium customer may want to bank as a family. Having all that information to hand when dealing with urgent family business is an excellent way to ensure the loyalty of your most profitable customers. But families can have all sorts of structures unsuited to tables of data.
In each step of the banking value cycle, your business’ growth is limited by your inability to connect data from historic systems and processes. There are 3 basic reasons for this
Heterogeneous Schema
Each system may have a very different data structure or schema which will make it fundamentally hard to match up. That may sound strange, but different products will have different ways of looking at the world, and that will mean a different way of organising data.
For example, accounts data is structured for individuals, and will not easily resolve for families, especially where nowadays families can be quite extended. One customer may have two accounts; or one account may have two signatories. Mortgage systems will revolve around properties, and so may have multiple owners (not accounts), and owners may share different properties with different co-investors. Shares may be held through a number of vehicles. Data may be held at various levels of aggregation. Different systems may apply categorisations or segmentations using incompatible or overlapping rule sets.
Simple schema matching can be achieved on conventional databases using foreign keys, intermediary tables that map records from one table to another. But anything complicated becomes impossible with this approach: your database slows to a halt; any query code becomes excessively convoluted; and it becomes increasingly difficult to be sure that your results are correct.
领英推荐
Graph databases do not impose a schema. They can easily accommodate local variations in data structure. They naturally operate at different levels of aggregation. So they can combine data on customers from incompatible product systems, regions and legacy businesses without contorting the data structures to fit together.
Multiple incompatible identifiers
Records in each system may use identifiers that are not common between systems. This can be a result of different products being developed in isolation on independent applications within the bank, or in different banks that have since been acquired. Sometimes systems in different regions or jurisdictions had to keep their data separate for regulatory reasons, and have not merged back together since.
For example, the key identifier for a mortgage system might be a property, but it might be the mortgage (a property could have more than one mortgage). Identifiers in wealth data are often masked for privacy reasons. Directors of companies may not have a unique id number at all, but could be saved in some auxiliary documentation or system. Matching any of these things to a Customer ID may not be possible directly.
In theory, matching is still possible without identifiers. Where entities share enough characteristics in common, you can guess that they are in fact the same thing. This is called Entity Resolution. Unfortunately matching multiple records in conventional systems requires extremely convoluted logic. You need to compare all possible combinations of pairs and then combine the results in all possible combinations. And then in practice, you still need significant manual intervention.
Graph databases can use graph entity resolution to combine datasets that lack common unique identifiers. Graphs overcome the combinatorial challenge of entity resolution because of the way they work (it's a long story, but it's to do with the maths of sets and converting a logical problem to an arithmetic one). Using a graph it is relatively simple to match datasets without common customer IDs.
Deeper connections
Most banking systems were designed simply to hold data and make it available for reporting, not for tracing between records to uncover relationships. But exploring from one record to another, multiple times, is critical to uncovering groups of financial criminals, to tracing flows of illegal cash through accounts, or finding sanctions-busting aliases, trusts and relatives of proscribed individuals.
In normal databases, each step through the data is at least one join and one filter. Multiple joins are computationally expensive (they take a long time to run), and you don’t know ahead of runtime how many you are going to need. So trying to explore connections using conventional databases is prohibitively hard.
Graph Databases are specifically built from the ground up to explore deep into the network. They store data differently (as maps and pointers) which allow them to traverse from on record to its connections very quickly. This not only enables exploration, but you can also program graph algorithms to identify patterns such as cycles or bottlenecks in cash transfers, a major tool in combating money laundering.
Conclusion
Growing your banking business will increasingly depend on really knowing your customer. To do this you need to connect together everything you know about them from all your different banking applications. This cannot practically be done using conventional databases which were never intended for the purpose. Graph databases solve the problem; they are built from the ground up to connect data together and then analyse the connections which are such a valuable part of your customer information. With connected customer information on a graph, you will be able to acquire customers, grow sales, increase margins, reduce overhead costs, manage risk and improve attrition rates.
What is a Graph Database?
Graph databases are platforms for analytics and machine learning that work on networks of connected data and relationships, known as “graphs”.
A graph database is valuable in cases where the relationships between things are important. So if you want to offer a joined up service to customers across products, channels, and geographies, you need to connect people across systems. Or if you want to find financial crime hiding in your data, then the connections between people, accounts and transactions is useful.
In all these cases normal “relational” databases struggle to trace the links between entities to see the important underlying relationships. Graph databases are specifically engineered up to do this.
TigerGraph is the leading graph database. TigerGraph Inc. was established in 2015 and is based in Redwood City, CA. TigerGraph was included in the Gartner Magic Quadrant in 2022, and inducted into the JP Morgan Chase Hall of Innovation in 2021.? TigerGraph is used by Forbes 2000 businesses all over the world, including 6 Tier 1 US, UK and European Banks, and delivers an average ROI of 6.0x according to Forrester Research.
You can get a free 50GB instance of TigerGraph Cloud at tgcloud.io or email [email protected] if you would like to find out more.
Head of Graph Data and Analytics, Data Governance Council member at Wolters Kluwer
1 年Harry Powell I see a talk with another title in the Gartner conference app with your name. Is this a mistake? What day is your talk? Thank you
Me too! Me too! Excited for you and also speaking at the Gartner D&A summit
Gartner Analyst | Data Science and AI
1 年Great stuff. See you there!