Actively Encoding Military Knowledge
Christopher Westphal
???????????????? ???????????????????????? + ?????????????? ?????????????????? + ???????? ?????????????? = ????????????????
Analytical systems are designed to ingest large volumes of data, quickly filter results, and help produce quality output, products, and reports. They are necessary to sift through massive volumes of data to expose hidden connections and unveil threats against our national security. Furthermore, within many military operations, there is a consistent and repeated level of turnover as personnel transition to new roles and make their assigned rotations. Unfortunately, when these assets move-on, they also take the valuable knowledge and insights they learned engaging their adversaries. The current state of the analytical industry must advance into a more agile and adaptable protocol where the processes, workflows, and outcomes are captured, stored, and made available for reuse and update as required. There are good advancements and lots of potential efficiencies gained with newer technologies like AI and ML. However, there are also well-documented limitations related to the common use of AI/ML. In general:
- They are intrinsically narrow and focused to their designated tasking and don’t easily adapt to new problem areas without a substantial reinvestment to realign them.
- They require large training sets pre-tagged with the appropriate classification values so the outcome is already determined.
- Their models can become biased.
- They are not well suited to explain their decision-making.
To overcome some of these issues while retaining the ability to automatically learn, classify, and encode patterns, systems like DataWalk augment the discovery process by allowing users to naturally “look around” the data at scale. Users visually explore connections, generate ad-hoc queries, try what-if scenarios, and interrogate any node in an investigative path. This process provides users a way to discover answers to a wide range of questions, even uncovering information they never knew existed. The analytical workflows are saved, shared, automated, and audited for future use/application and they also become an input to a ML model to help identify when similar conditions are met, thereby providing a more robust combination of process and technology to achieve and fulfill mission needs.
The workflows associated-with look-around analytics are essential to the operations. Collecting, categorizing, reusing, and sharing these assets is a fundamental capability defined within the DataWalk platform. Analyses can be re-used by other colleagues, and/or combine workflows with additional datasets and workflows. This process supports encoding organizational knowledge, where the newest analyst can consistently run the same analyses generated by seasoned experts. Augmented with ML, it becomes a powerful capability.
For example, an analyst could specify a complex query such as: Identify all combatant-enemy military operations within a specific geographical area showing an increase in supply transports of conventional munitions. The workflow can be created by one analyst, saved and shared to other analysts, re-run on a scheduled basis for monitoring and reporting purposes, or combined with other workflows such as “show all intercepted communications related to IED placement for the past 24 hours.” This ensures users remains current and up-to-date covering all aspects within their theater of operations.
Intelligence-led analysis is intended to utilize all available data, enhancing collaboration, rapid adaptation, and problem-solving for decision-making. Analysts can quickly consolidate and interact with the data, details, materials, and related content to assess circumstances and qualify targets. DataWalk can deliver a system that quickly adapts to emerging tactical situations, addressing operational needs, and examine significant change proactively handling emerging strategic threats.
The analytical workflows, aka: encoded knowledge, can be saved, monitored, and updated as new information is brought into the system. These analyses can be combined and weighted to trigger an alert based on a user-defined “score.” This builds an expanding knowledgebase of expertise that is auditable, adaptable, and remarkably transparent in its operation. These alerts and scores can be set to trigger automatically when new data that meets the criteria of the alert or score enters the system.
Utilizing the workflows for situation awareness with the ability to consume and generate relational structures from large graphs is important for different type of operations. There are many dimensions to consider for incorporating these observations and they generally include geospatial, structural, declarative (content), and temporal constructs (absolute and relative). The methodology for exposing them is focused on incorporating all dimensions, where available, with a baseline of detecting the sequences of mutually exclusive events including measurable differences between events and the ability to expose anomalous and irregular patterns.
The classification of these behavior from very large and diverse graphs will result in a high-degree of positive detection for exposing the reoccurring, sequential, and/or anomalous patterns associated with a number of multi-intelligence sources. The concepts have utility in a wide range of classification activities such as detecting command and control structures by correlating telecommunications activity (CDRs/calls/texts) with improvised explosive device (IED) incidents; multiple simultaneous target attack engagements revealing theater operations for assets movement and support, infrastructure, and supply chain dependencies; exposing cybercrime-related compromises and data exfiltration activities based on specific user-behavior; or exposing transnational criminal organization activity related to money laundering, narcotics smuggling, or human trafficking.
For example, the table below shows details [intentionally blurred] for IED events that occurred in Afghanistan for a selected period of time. Each event has recorded characteristics that define the date, time, location, and related outcomes (kills, forces, units, etc). In this scenario, Regional Command Capital (RC Capital) comprised of Kabul city and fourteen districts is the selected region-of-interest. Interacting with the data quickly reveals there are strong temporal and geo-spatial patterns for this area.
The map shows the exact geolocation of each IED using a generalized heatmap overlay to show clusters of events where the top three (3) densities are shown using a yellow-box (manually placed to highlight areas). These clusters are clearly positioned along major patrol routes and almost no IEDs have occurred in any of the nearby neighborhoods. Combined with other data such as patrol times/dates, vehicle types/numbers, convoy formation, distances from base, mission type, weather, line-of-site, terrain, and other environmental conditions can be processed to determine the degree-of-risk associated with any movement activity.
Furthermore, it is recommended to factor in other dimensions into the overall reasoning process including which military units were on patrol such as Coalition Forces, Canadian Forces (CF), or Afghan National Security Forces (ANSF). If available, activity on the communications infrastructure including active mobile devices, wifi-hubs, cell-tower geo-positions, mobile-advertising-ids (MAIDs), and other related signals. And, even IED activation, explosive, and projectile components to help categorize the device and expose different TTP behaviors to provide feedback on different equipment to counter the IED threat.
Incorporating these factors into a machine learning module combined with augmented human intelligence, DataWalk can configure the system to learn the circumstances, predict the probability, and recommend alternative course-of-actions (prescriptive analytics). Although this example was focused on IED events, the overall process, dimensions, sequences, and outcomes can be applied to other military operations.
Analytics is an iterative process that builds up a corpus of knowledge over a period of time. Data sets are constantly being updated, new sources identified, and critical data elements exposed. Thus, the processes and environment used to deliver the analytics will evolve and migrate through different capabilities as the offering and usage matures.
Although there are many capabilities, technologies, and interfaces available within the commercial and open-source markets, there needs to be a fluidity of how they are used, invoked, and the way their results are harmonized. Users need to remain focused on their analytics and not be distracted by copying results, logging into different systems, or running multiple queries different ways to guarantee complete coverage. The platform is designed to ensure every user is able to achieve full coverage of their analytical focus and utilize a knowledgebase of specific criteria, content, and workflows.