Azure Stream Analytics for Real-Time Fraud Detection
Microsoft Azure Logic Apps is a service that allows you to design and build scalable solutions for app integration, data integration, system integration, enterprise application integration (EAI), and business-to-business (B2B) communication, whether in the cloud, on premises, or both.
One of the features I wanted to explore in more detail is the support for real-time fraud detection or, speaking more broadly, the ability to detect and flag irregularities in incoming data streams in real time.
The tool to support this is Azure Stream Analytics. It is a fully managed, real-time analytics service designed to help you analyze and process fast moving streams of data that can be used to get insights, build reports or trigger alerts and actions.
TL;DR: Set up an Azure Event Hub receiving real-time Dynamics 365 transaction data and a Stream Analytics job scanning this Event Hub and using SQL-like queries to detect suspicious data.
Note that this approach uses deterministic SQL queries to discover fraud and other abnormalities in the data stream and relies on pre-created logic to discover fraudulent patterns in data. I will look at how to use the magic sauce of AI / machine learning to approach the same problem in the following post.
Azure Event Hub
The stream analytics job in this example will use data ingested into an Azure Event Hub. Event Hubs is a highly scalable data ingestion pipeline that integrates to many Azure services and is very helpful with many real-time processing solutions.
To set it up, you need to create an event hub namespace, and add a new event hub to it (see walkthrough here). The event hub will ingest our event data and will provide the input stream for the stream analytics job.
I have created a namespace dr-fd-post and a single event hub in this namespace called dr-fd-posthub.
Azure Logic App
I want to capture a transaction data stream from a Dynamics 365 - based system. I'm going to use an Azure Logic App to send new transactions to my event hub.
To keep things simple for this post, my Logic App sends the entire new record to the event hub.
The Logic App is triggered by the creation of a new record of type Payment in the connected Common Data Service (CDS) environment. The next step copies the record to a JSON object and can be used for additional manipulation. Final Send Event step uses the new record’s data to create a new event in my event hub (my event hub is configured to ingest events as JSON objects).
At this point, I want to make sure that my CDS data gets correctly ingested in Azure Event Hub. I'm going to create a few Payment records in Dynamics and use EventHub Viewer Windows app to make sure it's working. I can see my new Payments records in Dynamics.
Next, I'm going to fire up the EventHub Viewer and select correct Azure login and subscription.
The last validation step is to connect to my event hub. I see that new records do indeed get ingested by the Azure Event Hub.
Stream Analytics
With the event ingestion (input) part of the pipeline configured, I am ready to set up a new Stream Analytics job to read incoming Payments data collected in my event hub. Stream Analytics pipeline consist of an input, query and an output. Diagram below illustrates potential Stream Analytics pipelines (see this article for more details).
Navigate to Create a resource -> Internet of Things -> Stream Analytics job in Azure Portal to add it. After a new job is created, add an Event Hub stream input to the job (a single job may have multiple inputs).
I've chosen the dr-fd-posthub as the data source for my single input stream.
Transformation Query
The core of the solution is to configure a Stream Analytics query that would detect anomalous data in the ingested real-time event stream and flag it for downstream processing.
I'm going to use the query builder included in the Azure management interface for stream analytics.
To build the query, I need to be able to have a pretty good idea about the types of irregularities that might be encountered in my data stream. For example, to detect suspicious phone calls, I might be looking for the same calling number making calls from different locations within a short window of time.
In my example of scanning payment transactions, I want to find all transactions from a payee where payment amount significantly deviates from the historical average transaction amount that my system calculates and maintains for each payee.
While this example is very simplistic, it is enough to illustrate the potential of the tool.
In the screenshot below, the query reads subset of fields from posted events and calculates a simple deviation of payment to average amounts.
With the filter applied, I see some records that fit my detection criteria and merit additional attention.
Stream Analytics Query Language provides many specialized facilities for advanced stream analysis, including geospatial, windowing and time management analysis functions.
For example, I could have used a TUMBLINGWINDOW function to organize my event stream into a sequence of time-based windows, and group transactions by the payee within each window to detect payees who post transactions suspiciously often.
Anomaly Detection functions such as AnomalyDetection_SpikeAndDip use machine learning models to compute a p-value score to indicate how anomalous a given event is over the entire query input stream.
Job Output
The last step is to define an output for my job. I could send the query results to a SQL Database or even to an Azure Function (if I wanted to update a suspicious transaction record in Dynamics 365) but for the purposes of this post, I just want to store the results of stream analysis. The simplest way to do it is to configure an Azure Blob Storage container.
To do this, I have created a new Storage account and a new container dr-fraudsink in it and added a single job output pointing to this container using FraudSink as an output alias.
My job now has an input, a transformation query and an output. I can now start the job and monitor Payment event creation and detection. After a few Payment records are created in Dynamics 365, I can now switch to my Azure Blob container. In Azure portal I can see a new blob created by my stream analytics job and two records flagged as potentially fraudulent.
This approach can be used for a wide variety of tasks, including real-time alerts, self-updating data visualizations and dashboards, and various IoT applications.