Connected Transportation - Vehicle Analytics
Peter Piper, CISSP
Application Security Leader @ KPMG International | Cloud Security, Emerging Tech
Last year, in May of 2017 I wrote an article on "Connected Transportation". While I am currently studying, working through the materials necessary to pass Microsoft's 70-475 exam, I am going through a great Pluralsight course on Azure's Streaming Analytics. In this course, it just so happens that the sampling data has racing car data; moreover, it illustrates some of the key topics I originally discussed in this previous article.
What is Azure Streaming Analytics? Great question. It is a scalable Platform-as-a-Service (PaaS) offering in Microsoft's Azure public cloud that allows real-time analytics on streaming data without having the need to provision hardware. This service has various configurable inputs and outputs. Please note that there is a link below of how to configure Azure's API Management with Streaming Analytics. The following illustration represents a big picture concept.
Event Production -> Event queuing and stream ingesting -> Stream Analytics -> Storage -> Presentation
In the connected car scenario, the following items are critical to obtaining accurate data:
- Ingestion of data
- Analyzing the data
- Output of the data
One has to keep one critical concept in mind when working with streaming data: temporality, i.e. the data is always changing. With this in mind, "time" becomes very important when working through the query that is built when analyzing the data. Choosing the temporal window (tumbling, hopping or sliding windows) in Azure streaming analytics has impacts on downstream consumers. One needs to think of the I/O of the output configured. It isn't uncommon to see another Event Hub configured as the output of a streaming job. Additionally, as Azure streaming analytics allows starting and stopping of the analytics job, the following question can be raised:
Can I automate when I want to start | stop the job that analyzes my data?
The figure below demonstrates a sample streaming job that is paused. It has one input and one output configured. The Input is an "Event Hub" that has a "load-leveling" queue pattern feature. The output is a JSON file that is stored in Azure Blob storage. The data sent to the Event Hub is a simple console application that has various JSON "records" with various data elements.
A great feature of Azure's streaming analytics, is the ability to use a job for validation of various queries without having the need to specify the input nor output parameters. Upload some test data and you are ready to go. The following query and figure below can be applied to the concepts I introduced in the "connected transportation" article. How often are the various vehicle inputs by a driver that occur within a certain time span?
I can't imagine the amount of telemetry data being analyzed during 24 hours of Le Mans racing.
Using the sample data result set, the higher the number, the more frequency of that input being triggered. For example, if the brakes are applied more often, wear and tear is produced, causing shorter lifespan on various components, e.g. tires, brake pads, discs. More frequent braking also introduces other aspects: brake fluid boiling temperature, heat, expansion / contraction, brake dust, etc. How will this data be relayed to other consumers within the connected transportation model?
SELECT
Driver,
AVG(Accelerator) AS Accelerator,
AVG(Brakes) AS Brakes,
AVG(Steering) AS Steering,
AVG(ErsBattery) AS ErsBattery,
MAX(EventTime) as Time -- EventTime max is within the context of the temporal window specified, i.e. TumblingWindow
INTO
[YourOutputAlias]
FROM
[YourInputAlias] TIMESTAMP BY EventTime
GROUP BY
Driver, TumblingWindow(second, 100)
Azure Automation has capabilities to expose the runbook with a HTTP webhook. Invoking PowerShell commands to stop / start, plus monitor the streaming job can be a great component of a solution that incorporates streaming analytics that isn't required to be enabled 24x7. The following code is a small sample to get the brain juices flowing.
<#
Ref URLs:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-monitor-and-manage-jobs-use-powershell
https://databricks.com/blog/2017/05/22/running-streaming-jobs-day-10x-cost-savings.html
#>
Login-AzureRmAccount -TenantId '[YourTenantId]' -Subscription '[YourSubscriptionId]'
$rgName = "[StreamingJobResourceGroupName]"
$jobName = "[StreamingJobName]"
$currentDateTime = Get-Date
$streamJob = Get-AzureRmStreamAnalyticsJob -ResourceGroupName $rgName -Name $jobName -Verbose
$streamJob.JobState
if ($streamJob.JobState -eq "Stopped")
{
Write-Output "Job is Stopped. Starting job now..."
$output = Start-AzureRmStreamAnalyticsJob -Name $jobName -ResourceGroupName $rgName -Verbose
if ($output)
{
Write-Output "Job started successfully"
}
}
if ($streamJob.JobState -eq "Starting")
{
# TODO
}
if ($streamJob.JobState -eq "Running")
{
# TODO
$streamJobInputs = Get-AzureRMStreamAnalyticsInput -ResourceGroupName $rgName -JobName $jobName
# stop the streaming job if it is running more than one hour
if ( [math]::Abs($streamJob.Properties.OutputStartTime.Hour - $currentDateTime.Hour) > 1 )
{
Write-Output "Job is Running and not processing any requests. Stopping job now..."
$output = Stop-AzureRmStreamAnalyticsJob -Name $jobName -ResourceGroupName $rgName -Verbose
Write-Output "Stopping result $output"
}
}
if ($streamJob.JobState -eq "Stopping")
{
# TODO
}
Lastly, Azure has a great dashboard feature. One can quickly create a new dashboard and add widgets that are needed for the focused resources of interest.
Disclaimer: As always, this and all of my articles written to date are my opinion and not the opinion of my current employer.
- Azure Streaming Analytics Introduction link
- Pluralsight course on "Understanding Azure Stream Analytics" by Alan Smith
- Azure API Management with Streaming Analytics by Peter Piper