Entity Resolution: The Cornerstone of BSA Data Analysis
Constructing the BSA Entity

Entity Resolution: The Cornerstone of BSA Data Analysis

Entity Resolution (ER) is the process of establishing equivalence among data that refer to the same real-world entity (person, organization, place, etc.) due to duplicate, inconsistent, incomplete, or erroneous data. This process is essential for analyzing Bank Secrecy Act (BSA) data used by law enforcement, regulatory bodies, and national security agencies to combat money laundering, terrorist financing, and other illicit financial activities.

More than 260,000 regulated financial institutions submit over 55,000 reports daily (exceeding 20 million annually) to the Financial Crimes Enforcement Network (FinCEN), the federal agency responsible for BSA oversight. These reports detail suspicious transactions, control of foreign financial accounts, beneficial ownership, and bulk cash movement and purchases. However, analyzing this data presents significant challenges due to name variations, nicknames, aliases, and misspellings (often intentional) which can obscure crucial links between records pertaining to the same individual.

ER directly addresses these challenges by accurately linking records and consolidating data, building a comprehensive view of individuals, organizations, and their financial activities. This "big picture" is essential for understanding the scale of money laundering operations, mapping connections to expose criminal networks, and ultimately aiding law enforcement in targeting key figures.

Each BSA submission made to FinCEN includes core details about the activity (filing, transaction, or event), such as dates, times, amounts, descriptions, types, and other relevant information. Certain data elements like phone numbers, addresses, identification numbers, IP addresses, and email addresses, are categorically unique meaning they inherently represent distinct analytical-entities, unlikely to be confused by similar values or content.

While many of these values may have different formats, various transformation and cleanup processes can standardize them for proper identification. For instance, the phone numbers (123) 456-7890, 123.456.7890, and 1234567890 are easily recognized as referring to the same entity. For more complex data like addresses, standard parsers, abbreviation lists, and lookup tables can accurately resolve inconsistencies and help ensure accurate matching.

Consolidation becomes more challenging with less distinctive data, such as common names, which often lack the specificity needed for accurate entity identification. The name "John Smith," for instance, illustrates this difficulty; without further information, distinguishing between multiple individuals becomes problematic. Is the "John Smith" from New York the same as the one from New Jersey? Is the "John Smith" born on 02/05/1998, the same as the one born on 05/02/1998? Is "John Smith" the same person as "Jon Smiths"?

To resolve these ambiguities, ER must rely on additional parameters or "descriptors" to establish unique identities. These can include demographic information such as gender, date of birth, age; physical characteristics like height and weight (non-BSA data); and distinct identifiers like Social Security Numbers, driver’s licenses, or other types of IDs. The lack of these identifiers associated with names in datasets can lead to inaccuracies and inefficiencies in investigations.

BSA Format & Queries

For over two decades, the Bank Secrecy Act Electronic Filing System (BSA E-filing) has provided a secure, electronic method for regulated entities to submit BSA forms. This system enables filers to create and submit well-formatted XML "batch" files containing one or more forms, ensuring compliance with the specific XML schema for each form type (SAR, CTR, etc). FinCEN collects and stores all this data which are subsequently made available via the FinCEN Query System (FCQ) as shown in Figure 1.

Figure 1. Generalized BSA Architecture

The XML format for all BSA forms use consistent representations for data types such as addresses, identification numbers, phone numbers, email addresses, URLs, and other supporting information. This consistency extends to the representation of "parties" associated with the filings, including individuals, suspects, owners, institutions, referrals, law enforcement contacts, and bank officials. Each party's role in the reported activity is defined using a simple numerical lookup code.

FinCEN publishes the XML format for each BSA form, ensuring consistent data exchange. These formats provide a structured approach to detailing transactions and related entities. For example, the Suspicious Activity Report (SAR) format can be found at: FinCEN SAR XML User Guide.

To deliver effective analytics, it is important to understand the underlying XML structures and how it impacts ER. Each reported ACTIVITY (SAR, CTR, 8300, FBAR, CMIR, DEOP, BOI) is assigned a unique identifier (ActivityID) that encapsulates all associated details. Within this activity wrapper, all parties involved are identified according to form-specific requirements. Every party—whether the filing institution, a bank representative, or the subject of the activity—is represented by a single, unique PARTY structure (PartyID) within a given activity. Each PARTY record can contain multiple sub-entities to reflect all associated addresses, phone numbers, identification numbers, email addresses, and names. Figure 2 shows the representative structure.?

Figure 2 – XML Schema for BSA Party Entry

As illustrated in the abbreviated FinCEN XML specification shown in Figure 3, an ACTIVITY (e.g., 111…) can contain a PARTY (e.g., 222…) with multiple PARTYNAME entries (e.g., 333… and 444…), representing, for example, a "Legal name" and an "Also known as (AKA)" name. For clarity, other associated XML fields related to the activity, party, and party-name have been omitted.

Figure 3. XML Snippet for Activity->Party->PartyName

Each party record will always include a "Legal name" and may also contain zero or more "Also Known As" (AKA) or "Doing Business As" (DBA) entries. Each name representation is assigned a unique PartyNameID to distinguish every variation. FinCEN assigns a distinct PartyID to each party upon processing the form, ensuring a unique identifier for that specific entity. Consequently, even if the same bank submits multiple SARs or CTRs concerning the same individual, each submission generates a new PartyID, resulting in different numbers for every ActivityID for that individual within the FinCEN BSA E-filing database.

While this method of data representation is typical for many collection systems, it creates ER challenges, particularly compounded by name variations. For instance, if one bank records an individual as JON SMITH and another bank as JOHNNY SMITH, a query searching for JOHN SMITH might overlook crucial data. Although a skilled analyst might try to account for these variations or use other identifying information such as address, phone number, email, identification number, or related entities and accounts, each of these attempts necessitates different queries. This forces the analyst to manually track numerous entities and values and subsequently synthesize all the results into a coherent diagram, a time-consuming and potentially error-prone process.?

This limitation affects highly specific queries, such as searches for a particular name. It also impacts operations like SAR-review-teams or other investigations using proactive queries, such as "all transactions for a specific region within a defined timeframe" (e.g., all SARs filed in Miami, FL in the past six months). If the underlying data is not properly resolved and aligned, the results may display numerous disconnected networks or overly dense representations due to the sheer number of entities present.

Figure 4 illustrates two very similar entities originating from a SAR and a CTR. While basic data cleaning can readily standardize their phone numbers, identification numbers, and addresses, standard analytical systems would likely treat them as different entities. Applying ER to this combined data would significantly improve the accuracy and reliability of the results. Varying the stringency of the matching criteria by using different combinations of data fields (such as name and address, name and identification number, name and phone number, or any combination) will further enhance the confidence in the resolved entities.

Figure 4. Similar Entity Structures

The goal is to deliver a more complete, thorough, and reliable representation of all the critical data available for any investigation. Although the examples presented are based entirely on BSA content, it becomes an analytical multiplier when accessing and combining data from different content including other government sources, social-media, open-source, and subscription services. Figure 5 depicts how data from SAR and CTR sources are easily integrated using ER by exposing similar names in combination with other core-entities such as phones, emails, addresses, accounts, and id-numbers.

Figure 5. Harmonizing SAR and CTR Sources

Ultimately, to enhance the core BSA XML framework, agencies should implement an ENTITY structure with a unique EntityID. This process, ideally performed daily during data loading or collection, would involve applying ER matching templates set at a "strict" level to ensure only highly probable matches are made. An associated EntityScore could quantify the strength of each match, serving as a filter for results. Less stringent matching criteria could then be employed for specialized analytical contexts, such as counter-terrorism investigations, where broader searches and comparisons are necessary.

FinCEN Query System (FCQ)

During an IT modernization program circa 2012, FinCEN replaced the legacy Web Based Currency and Banking Retrieval System (WebCBRS) system with the FinCEN Portal and Query System (FCQ) that permits authorized users to query the BSA data using an on-line database query application. Today, the FCQ supports approximately +2.5 million searches each year across hundreds of agencies, task forces, and investigative units. An early screen capture for searching the FCQ for several key fields is show in Figure 6.

Figure 6. FinCEN Portal and Query System (FCQ)

BSA data is a crucial resource for virtually all agencies conducting financial crime investigations, becoming a standard component of many government investigations. Reports indicate that nearly 90% of IRS Criminal Investigations (CI) cases utilize BSA data. The FBI leverages this data in thousands of cases involving transnational criminal activity, public corruption, international terrorism, and organized crime. Homeland Security Investigations (HSI) relies on BSA data for a broad spectrum of criminal investigations, leading to indictments, convictions, and the seizure of billions in assets, including currency, virtual assets, and bulk cash.

FinCEN annually recognizes agencies for notable investigations using BSA data to combat illicit activities including Drug Enforcement Administration (DEA) for interdicting transnational criminal organizations, Department of Justice’s Civil Rights Division Criminal Section for human trafficking and smuggling investigations, and IRS-CI for detecting fraud, corruption, and use of synthetic identifications. A compilation of cases where BSA data was instrumental in identifying or supplementing investigations was recently published by FinCEN and an abridged synopsis for several of these are provided below:

  • US Secret Service (USSS) and US Postal Service (USPS) exposing cybercrimes using business email compromise (BEC) schemes plus numerous fraud cases. The USSS also investigated suspicious purchases made with government credit cards in a corruption case involving kickbacks, bribery, wire fraud, aggravated identity theft, production of false identification, stalking, and sex trafficking of a child.
  • Postal Inspection Service (USPIS) pursued on online romance scam by a man using multiple identities to defraud dozens of women out of cash and assets who was?ultimately indicted for multiple counts of wire/mail fraud, aggravated identity theft, and money laundering.
  • Department of Defense (DOD), Office of Inspector General (OIG) for detecting procurement frauds and proliferation financing schemes. Also, the DOD Defense Criminal Investigative Service (DCIS) for uncovering conspiracy, kickbacks, and aiding and abetting contract frauds for operating a foreign US Army base’s public works program.
  • US Marshals Service (USMS) along with the Department of Health and Human Services (HHS) Office of Inspector General (OIG) and the FBI to investigate and review potential Medicare Fraud schemes.
  • The Federal Deposit Insurance Corporation (FDIC) Office of Inspector General (OIG) actively targeting government programs fraudulently exploiting funding from the CARES Act and Paycheck Protection Program (PPP) utilizing shell companies and dormant business entities.
  • Air Force Office of Special Instigations (AFOSI) exposed an overseas counterfeit ring with thousands of transactions intended to defraud financial institutions.
  • Diplomatic Security Service (DSS) investigated passport identity fraud involving dozens of fictitious accounts, mortgage fraud, and numerous improper PPP loans.
  • Department of Agriculture (USDA) and the FBI investigated a public corruption embezzlement case with grants issued for the Child and Adult Care Food Program and the Summer Food Service Program under a not-for-profit to serve at-risk youth.
  • Bureau of Alcohol, Tobacco, and Firearms (BATF) used BSA data to investigate wire fraud and stolen firearms involving a subject using multiple aliases, addresses, emails, and accounts.?
  • U.S. Attorney’s Office (USAO) Eastern District of Missouri relied on BSA to expose a telemarking fraud scheme targeting elderly victims, plus exposed the concealment of assets and cryptocurrency transactions.

These cases demonstrate the critical role of BSA data in uncovering financial crimes and achieving successful prosecutions. While effective use of this data requires agencies to retrieve, consolidate, and integrate information from various sources, ER offers a significant advantage. By automatically linking disparate records, even those with inconsistencies, ER addresses the inherent fragmentation of BSA data. This process creates a comprehensive view of individuals, organizations, and their financial activities, revealing hidden connections and the true scope of illicit operations like money laundering. Ultimately, ER enhances the quality, efficiency, reliability, and scope of investigations, transforming fragmented data into actionable intelligence and strengthening the fight against financial crime.

Good Data Leads to Good Analysis and Results.

Kenneth Rijock

Financial Crime Consultant

2 个月

Outstanding, Chris. ??

要查看或添加评论,请登录

Christopher Westphal的更多文章

  • What’s in a Line?

    What’s in a Line?

    A line is very simple, basic, bounded, and easy to understand in the context of a link-chart or network diagram. It’s…

    4 条评论
  • GAGL - FLOCK TOGETHER

    GAGL - FLOCK TOGETHER

    Visiting a new city or out with friends on a Friday night – what if you could instantly find the hot restaurants and…

  • PPP Analytics Exposing Questionable Loan Patterns

    PPP Analytics Exposing Questionable Loan Patterns

    Overview This article provides a discussion and examples of the PPP Loan program and the application of analytic…

  • Actively Encoding Military Knowledge

    Actively Encoding Military Knowledge

    Analytical systems are designed to ingest large volumes of data, quickly filter results, and help produce quality…

  • Analyzing Opioid Abuse, Fusing Data, and Exposing Real World Patterns

    Analyzing Opioid Abuse, Fusing Data, and Exposing Real World Patterns

    Note: All results presented in this article are personal observations and interpretations based on the values contained…

    6 条评论
  • Analyzing The Data: A Hypothetical Investigation Using DataWalk

    Analyzing The Data: A Hypothetical Investigation Using DataWalk

    DataWalk is a next-generation enterprise-class platform for revealing patterns, relationships, and anomalies for law…

    1 条评论
  • Stop “Monkeying” Around With Your Analyses

    Stop “Monkeying” Around With Your Analyses

    For some, when tasked with writing a story and presented with a blank sheet of paper, they may feel intimidated or…

    3 条评论
  • Next Level Analytics

    Next Level Analytics

    What’s the cure for cancer? How is the stock market going to perform? Who committed the crime? The answers to these…

    8 条评论
  • Pssst, wanna know a secret???

    Pssst, wanna know a secret???

    Wait For It… Over the past several months, I’ve been working “under the hood” on the next generation “big data”…

    7 条评论
  • New Beginnings...

    New Beginnings...

    As the old saying goes” “time flies when you're having fun” (or Tempus irreparabile fugit). It’s hard to believe 4…

    19 条评论