Classic Approach and Modern Reality interaction for enhanced Big Data Applications across Industries
Aisha Ekundayo, PhD
Data Analytics Consulting | AI Consultant | Data Product Management
As highlighted in the previous article, business processes generate a vast amount of structured and unstructured data analysed with Advanced Analytics. For example, hospitals would have patient’s health data, appointments notes, clinical data in the form of images, demographic data, geographic data and website usage data. For NHS England, this includes data from over 200 trusts, with up to 10 hospitals under each trust. The hospitals provide services to almost 60 million patients, as reported by NHS digital as of 1st April 2021, with millions of daily appointments. Big data from healthcare is used to improve treatments and to predict patient risk factors for different diseases to reduce occurrences, report on vaccination rates, amongst other things.
There are numerous examples of how big data analytics is used to create AI applications across industries; this blog presents common use cases. Organisations are harnessing the power of cloud computing and analytics to support data-driven decisions and to empower their employees through the automation of repetitive tasks and decisions. A common theme across industries is combining traditional data sources and new data sources as input data in AI models, including machine learning algorithms. See my article on Machine Learning for more details on machine learning methods.
In Statistics, there is a theory called ‘Mean Reversion’. It refers to how a variable (data point) volatility and historical movement revert to the average level in the long run. Past data is the best predictor of the future once the noise is removed and underlying patterns are uncovered. Mean reversion is a simplified explanation of how time series data behave. A widely used example of mean-reverting behaviour is stock price changes, with a test of stationarity being the first step in analysing such data.
Big data analytics, including Machine Learning, go beyond statistics because it enables us to discover unknown patterns in vast data points, even when there is no theoretical underpinning. In essence, with billions of market-related data combined with data from other sources, we can provide new insights and knowledge that we did not know existed. Therefore, AI with cloud computing takes empiricism and positivism to the highest levels with modern parallel processing capabilities and unlimited storage.
In Financial Services, big data support three main activities: investment decisions, personalised banking, and fraud detection. Machine Learning algorithms are trained using past data to predict future occurrences in all three scenarios. Therefore, the underlying training data is so critical in AI, hence the call for ethical thinking when designing training dataset to ensure that it is representative and free of bias.
Combining data from different sources and ingesting them into a datalake is not unique to Finance. As part of all data science lifecycle such as the Team Data Science Process (TDSP); data acquisition and understanding is a critical phase. Usually, after understanding business objectives, data understanding is the next step in the project. It involves identifying relevant data sources, ingesting them into the same location (blob storage), and joining the datasets together for analysis. As part of model training in supervised learning, the combined dataset is used to identify complex and unknown patterns in the data to predict future events.
The table below presents common use cases of big data analytics across various industries.
Table 1: showing examples and input data of big data analytics
Input Data include both past and real-time data. The model is retrained at consecutive intervals to identify new trends and produce better results. Retraining is also crucial to mitigate data drifts, which occurs when the underlying data changes, and we need to include the new trends as part of our training set.
All the examples above are machine learning models, including supervised learning, unsupervised learning, and semi-supervised learning. Geospatial analysis is a supervised learning technique used to enrich datasets by combining geospatial information, census data and socioeconomic data with industry-specific data.
To summarise, advanced analytics is applied to big data to build applications that support and improve business processes. Real-time analysis of business data is critical in almost every industry to remain competitive and empower employees to make data-driven decisions, improve customer journeys, and reduce costs.