Analyzing The Data?: A Hypothetical Investigation Using DataWalk

Analyzing The Data: A Hypothetical Investigation Using DataWalk

DataWalk is a next-generation enterprise-class platform for revealing patterns, relationships, and anomalies for law enforcement and intelligence operations. DataWalk is designed to handle large volumes of data derived from multiple sources and is a server-deployed platform (on-premise or cloud-based) to support collaborative investigations.

Using a modern, big-data technology stack, combined with a user-friendly visual interface, DataWalk eliminates the restrictions of data silos, allowing agencies to rapidly import and blend data from multiple sources to provide a singular data view using intuitive visualizations including histograms, link diagrams, maps, and timelines.

Introduction

The Universe Viewer (UV) provides a global view onto all the datasets available to the user based on their specific access privileges. Data sets come from sources including any standard relational database, Microsoft Excel files, CSV files, web pages, Hadoop HDFS, and any other source with a JDBC, ODBC, WSDL, or RESTful interface. This example demonstrates a combination of data ranging from synthetic and open-source to social media and subscription services.

No alt text provided for this image

In this “hypothetical” scenario, a data set was manually created (shown with green-arrow) using open-source content describing the Jalisco New Generation Cartel (Cártel de Jalisco Nueva Generación, CJNG), a criminal group based in Jalisco Mexico and headed by Nemesio Oseguera Cervantes ("El Mencho"), one of Mexico's most-wanted drug lords. The primary content of this set was assembled from DOJ poster boards of the key members, leadership, and familial relationships. Examples of these materials is shown in the screenshots provided below.

DOJ CJNG Most Wanted
DOJ CJNG Most Wanted

Entity Search

An analyst wants to search for a specific person related to the CJNG investigation. Using the basic “SEARCH” option from the primary DataWalk menus, a list of available fields is presented, and can be adjusted and configured to meet any specific content. The user is looking for information on ABIGAEL GONZALEZ VALENCIA. Not knowing the exact spelling of his name, the analyst applies a Soundex-transformation to a common spelling of Abigail by simply typing in SX(Abigail) to identify variations such as Abigael, Abigail, Abigale, Abigayle, Abegail, etc. More advanced matching for nicknames such as Gabby, Gail, Abbie, and Gayle could be applied with advanced features [shown later].

No alt text provided for this image

DataWalk supports a range of different search-options to help identify variations in the data. The (i) icon at the top of the menu (upper-left) provides a balloon-help cheat-sheet for different functions. These functions include wildcards (*), SX (Soundex), TYPO (letter errors), STEM, AND/OR, and REGEXP (to handle special configurations). Additional functions can be added per user-needs (e.g., Metaphone).

No alt text provided for this image

The search is conducted on different fields from different loaded sets to identify potential matches. DataWalk identifies results from four (4) sets including CJNG-People, Arrests, FBARs, and MSBs. Each is categorized by the source along with a sample set of fields to help the user identify which records/entities best match their select. In this case the name in the CJNG set is selected and the user brings the results into a Link Chart.

No alt text provided for this image
No alt text provided for this image

Link Chart

Using visually appealing icons, glyphs, colors, and related components, DataWalk offers a viewpoint into the data, emphasizing key content, important information, and critical connections. In this example, the blue-star (upper-left) represents the entity is in a leadership position within the organization and is set according to a value defined in the underlying data. Additionally, the red-circle (upper-right) signifies a status of the entity. In this case the letter “I” indicates he is currently “incarcerated/imprisoned” and others show (A) arrest, (F) fugitive, and (D) deceased. These markers are easy to customize to meet the needs of any investigation.

No alt text provided for this image

The next step is to see how this entity relates to other members of the CJNG organization. The analyst selects the entity and invokes the “Add Linked Objects” menu on the right-side of the interface. From here there are several options available including 1st and 2nd degree connections and a list of any connected sets. For this example, the CJNG-People set is selected and "Add Objects" is initiated to generate to show the next-level of connections.

No alt text provided for this image

Based on the set-connections defined in the Universe Viewer (UV) the system follows all selected connections and brings back any new entities. The results shown below quickly depict the relationships from/to Abigael to other cartel members and family. Every connection also displays the type/name of the relationship for these ten (10) new entities. The analyst quickly sees that NEMESIO OSEGUERA CERVANTES is another leader in the CJNG cartel and is currently a wanted (F) fugitive. Additionally, another entity (JENNIFER BEANEY CAMACHO CáZARES ), with an image, shows as his wife. Any level of detail is shown in the labels, comments, or the diagram arranged using different placement techniques.

No alt text provided for this image

All the entities are selected and the Add Linked Objects reapplied to show the next level of connections. The results are easily arranged using different placement methods to minimize link cross-over.

No alt text provided for this image

At this point the analyst realizes there is some missing information from the diagram. Specifically, the link between the wife/mother and her children. The analyst decides to create a new link between JENNIFER and NOEMI using the “add connection” feature available in DataWalk top-level menu.

No alt text provided for this image

Add Entity/Connection

Once this mode is activated, the analyst simply clicks on one of the entities and holds down the mouse (and the link follows the cursor) and selects the second entity and releases the mouse to establish the link. In this case, the “direction” of the link is not important, but in other cases, the order of connection defines the “flow” of the relationship.

No alt text provided for this image

Once the mouse is released, a pop-up menu requires the analyst to select what type of link to create. In our CJNG model there is only a single type of connections called “related” used to define the role and connect all cartel members. In other models, there can be multiple types of connections based on different needs and requirements – it is a simple process to add additional link-types to the model.

No alt text provided for this image

In this specific model, the “related” connection allows the analyst to enter additional information and details regarding the linkage. As entities can have different types, roles, and relationships over a period of time, it is important to capture all of the details to ensure the proper fidelity is maintained for the analytics. In the next screen shot, the number of attributes is fairly basic and is easily extended to add/change them to meet evolving needs.

No alt text provided for this image

The analyst enters the type of relationship (mother/child). Often these values are defined as an enumerated-type (e.g., a predefined list) chosen from a pick-list. Different types of components (e.g., date selectors, selection-boxes, spinners, etc.) are used to simplify data input. Once completed, the analyst “saves” the results and the screen now shows the new connection between the selected entities.

No alt text provided for this image

In this environment, a special configuration provides a “supervisor” with a notification that new information is added to the system. The system automatically detects this change and then signals an alert to the designed personnel regarding this situation. In the upper-right part of the display, an icon (red bell) visually displays an active alert with a count of the total number of outstanding (unread) alerts. Optionally, the supervisor can receive an email (or other notification via an external ticketing system) notification of this alert. 

No alt text provided for this image

The supervisor can review the alert by logging into the system and invoking the “Workspace” dashboard to see which alert triggered. Using the same ringing red-bell (animated) the alert is identified and the supervisor clicks on the “New Objects” tab showing the one (1) new entry available for review. In this case, the system requires the supervisor to “Approve” the data change before any other users can see this information. Note: the analyst that originally creates the data can always see it in their own sandbox, but others are excluded until it is approved. In this example, the supervisor has 3 options; approve, deny, or request more information. The select-values are configurable to meet various agency or investigative needs.

No alt text provided for this image

Note: this same process is applied to the creation of new entities (e.g., cartel members). Once the supervisor approves the new data, all other authorized users will see the data next time they query the system.

Expand The Network (Walk Data)

At this point, the analyst continues to expand out the cartel network showing additional levels and relationships among its membership. The highlighted entities in the following diagram show those added entities.

No alt text provided for this image

Using the various placement techniques available within DataWalk, the analyst can define the best format to meet their analytical needs. The screenshot shows the “hierarchy-top” to position each of the three (3) leaders at the top of the diagram and allow their connections to flow downward. This helps understand the different roles and significance of members in the cartel. Although this example is limited in size, there can be many levels represented.

No alt text provided for this image

At this time, the CJNG set is exhausted, as there are no additional data available to expand the network. However, in the Universe Viewer (UV), the CJNG set is connected to the “People” set which is comprised of the names of people derived from many different sets (investigations, corrections, financials, watchlists, arrests, registrations, etc). The analyst selects the People set to see any new connections and uncovers there are two (2) matches from Zachary Manning (cousin) and Denise Cook (friend) both stemming from LILIANA ROSA CAMBA located in the lower-right of the cartel network diagram.

No alt text provided for this image

The People set has connections to a wide range of other sets and contains much more robust content. The analyst does a drill-down on both Zachary and Denise to see more specific details about their backgrounds. Then using the “Add Linked Objects” panel, chooses all of the available sets to see any additional connections. Note: the values for names, social security numbers, addresses, phones, and other personal details are "synthetic" and are not intended to reflect any real-world person.

No alt text provided for this image

The system accesses each selected set to pull out any connections for either Zachary or Denise, as shown in the following diagram:

No alt text provided for this image

At this point there are two viable options to pursue to determine additional connections, behaviors, or related activities.

  • Zachary previously arrested for a bank fraud, has a social media profile, a valid social security number, involved in various BSA activity, owns a BMW 325i, and lives in Compton, CA.
  • Denise has records in the Relativity set (an authoritative document management platform), BSA records, a registered phone, a valid SSN, and an address located in La Mesa, CA.

In the expanded view of Zachary, the BSA set shows both SAR and CTR transactions and presenting them geographically on a map shows their activity relative to their home address. A heavy concentration of SARs at two specific banks and wider usage of banks for CTR deposits.

No alt text provided for this image
No alt text provided for this image

The number of people shown in the network diagram related to this address indicates some type of “safehouse” usage and invoking the street-view option by right-clicking on the home address provides an automatic link to Google Street View to validate the address. As seen in the screen-capture, this property also has a larger number of vehicles present. Drilling-down further (expanding the network) on the other people shows they all have additional BSA transactions. The analyst classifies this group as “money mules” and will investigate further.

No alt text provided for this image

Switching back to Denise and expanding her BSA shows a similar number of transactions for SAR and CTRS. All the SARs are under $10,000 indicating some type of structuring behavior.

No alt text provided for this image

Showing the timelines for both SAR (green sphere) and CTR (orange sphere) transactions indicate there was a mix of both types of filing in the March-August timeframe. The analyst knows that people change their behaviors when their actions are being recorded. Beginning in July, Denise started to structure her cash deposits under the $10k limit (around $8k) to avoid the CTR filing forms. From that point, the bank started to exclusively file only SARS reports to document this suspicious behavior.

No alt text provided for this image

When her activity is presented on a geospatial map, her transactions are clearly conducted at locations along the US/Mexico border (US side) at various/different banks and institutions. Most SARs are reported from one specific location while the CTRs are reported by a number of different banks. Clearly there is some type of explicit intent for Denise to travel almost 30 miles to make her cash deposits. The analyst will further review this information.

No alt text provided for this image
No alt text provided for this image

When performing a Google Street View of her address, the results show a high-end estate nearing the end of its construction. The value of this property is $1.1M.

No alt text provided for this image
No alt text provided for this image

The analyst returns to the link chart to determine if the other connections will return any additional entities. When choosing the Add Link Objects, there is an option to Show Object Counts that provides the total number of entities that will be returned if the query is run. In this case the Intercepts set shows there are 707 records available for the phone. To avoid clustering the display with this new data, analysts right-clicks on phone number entity and copies it into a new Link Chart display.

No alt text provided for this image

In this new display, the analyst expands the network using the Intercepts set resulting in a large concentration of connections. This type of data is not often used for “network” analysis but is much better suited for geographical (lat/long) and temporal (date/time) analyses.

No alt text provided for this image

These intercepts are the location records tied to Denise’s mobile phone and when displayed using the heatmap option, it shows a heavy concentration of activity in Brooklyn, NY. Each sphere represents a location reference. The darker colored spheres indicate higher concentration of activity (e.g., stay over, lingering, stopped).

No alt text provided for this image

When placed on a time chart, the analyst sees the activity occurred over a 5-day period: March 13-18. Each spike in the timeline shows the relative activity for that period and zooming in provides better resolution (hours/mins).

No alt text provided for this image

As the analyst manipulates the timeline and focuses in on the first spike, it become clear that Denise flew to New York and landed at John F. Kennedy (JFK) airport around 4:30, taking about an hour to get her bags and hail a taxi (or Uber). It appears she went directly from the airport to the Upper West Side in Manhattan via the Bronx (via the Major Deegan Expy) where she stayed for approximately 1 hour.

No alt text provided for this image

Over the next several days, she concentrated her movements around Brooklyn, spent time in Staten Island, traveled out to Islip and Medford on Long Island, and also Queens. The analyst can infer the specific locations visited by Denise to cross reference them with other data sets to determine if they are significant or have any additional intelligence value.

Returning to the original link chart, the analyst further expands the phone number and finds a match against the Federal Firearms License (FFL) set. It connects with Fine Jewelry Inc with locations in Southern California including San Diego, Solana Beach, Oceanside, Rancho Bernardo, and La Mesa (the location where Denise lives). Based on the connection between the Jeweler and the Cartel, there may be some type of high-end luxury buying, money laundering, or a front business, or some combination thereof. The analyst will continue to do research and find additional content to determine the nature of these relationships.

No alt text provided for this image

Fuzzy Matching (Aliases)

One last step, the analyst checks to see if there are any potential “alias” entities matching Denise. For this configuration, the system is set up to identify entities that match on several conditions including: same gender, same race, same ethnicity, same year-of-birth, similar last names (Soundex), and live within 25 miles of each other (via zip code). In this set, there are three (3) matches generated: Didi Cooke, Densie Cooks, and Deniece Cooks. Any type of condition can be defined for “fuzzy” matching and can vary from set-to-set.

No alt text provided for this image

The final diagram shows all the entities and their connections. On the bottom of the display, there are a series of thumbnails showing the 19 steps used to go from the original entity to the final results. When the chart is saved (or restored) all of these steps are available for review. Thus, if the analyst is asked about the process used to find these results, it can easily be played-back using the history. And other analysts that access this chart will also see this history.

No alt text provided for this image

Final Report

Finally, if the analyst needs to send a report (aka targeting package or dossier) to another person without access to the system, they simply create a PDF report with all of the relevant details. These reports are configured to client specifications to include headers/footers, watermarks, disclosure statements, and even agency logos. The analyst generates a report from the Folder associated with Denise.

No alt text provided for this image


Carrie Hawes

20+ Years of Experience in Contract Renewals, Negotiations, and Account Retention.

4 年

Chris, what a great example of how DataWalk can be used in an investigation. Thank you for sharing!

要查看或添加评论,请登录

Christopher Westphal的更多文章

  • Entity Resolution: The Cornerstone of BSA Data Analysis

    Entity Resolution: The Cornerstone of BSA Data Analysis

    Entity Resolution (ER) is the process of establishing equivalence among data that refer to the same real-world entity…

    2 条评论
  • What’s in a Line?

    What’s in a Line?

    A line is very simple, basic, bounded, and easy to understand in the context of a link-chart or network diagram. It’s…

    4 条评论
  • GAGL - FLOCK TOGETHER

    GAGL - FLOCK TOGETHER

    Visiting a new city or out with friends on a Friday night – what if you could instantly find the hot restaurants and…

  • PPP Analytics Exposing Questionable Loan Patterns

    PPP Analytics Exposing Questionable Loan Patterns

    Overview This article provides a discussion and examples of the PPP Loan program and the application of analytic…

  • Actively Encoding Military Knowledge

    Actively Encoding Military Knowledge

    Analytical systems are designed to ingest large volumes of data, quickly filter results, and help produce quality…

  • Analyzing Opioid Abuse, Fusing Data, and Exposing Real World Patterns

    Analyzing Opioid Abuse, Fusing Data, and Exposing Real World Patterns

    Note: All results presented in this article are personal observations and interpretations based on the values contained…

    6 条评论
  • Stop “Monkeying” Around With Your Analyses

    Stop “Monkeying” Around With Your Analyses

    For some, when tasked with writing a story and presented with a blank sheet of paper, they may feel intimidated or…

    3 条评论
  • Next Level Analytics

    Next Level Analytics

    What’s the cure for cancer? How is the stock market going to perform? Who committed the crime? The answers to these…

    8 条评论
  • Pssst, wanna know a secret???

    Pssst, wanna know a secret???

    Wait For It… Over the past several months, I’ve been working “under the hood” on the next generation “big data”…

    7 条评论
  • New Beginnings...

    New Beginnings...

    As the old saying goes” “time flies when you're having fun” (or Tempus irreparabile fugit). It’s hard to believe 4…

    19 条评论

社区洞察

其他会员也浏览了