登录查看更多内容

Article 2 - The Predicament of Predictors

Rommel Sharma

Go To Market Services Lead - APAC. Adobe Professional Services.

发布日期: 2024年9月26日

"Innovations in technology are most impactful and enjoyable when they solve real-life problems, improve lives, enhance safety, and are accessible to the average person."

In the first article we touched upon some foundational concepts of linear regression used in machine learning focusing on "understanding and minimizing error". The article included some examples and also an overview of an exciting million dollar prize competition run by Netflix where the challenge was to better the RMSE of its Cinematch algorithm by 10%.

You can read the first article here: Article 1: The Predicament of Predictors

In this article we continue to build upon our Machine Learning vocabulary and move beyond linear regression, considering solutions for more real life challenges with much larger volumes of data to process. We will also cover essential branches of mathematics that are the building blocks of Machine Learning algorithms.

Many challenges that impact human lives at a large scale (for example those related to Health, Cyber Security, Finance, Governance, Law and Order)?are having qualitative data in addition to quantitative data that needs to be understood well and in the right context to make recommendations. The analysis of these issues often lead to results that are a classification into pre-determined categories (and not from a possible set of continuous numeric values).

Consider these three scenarios, as real life examples where the output is a classification into a category or categorical variables or labels and not a continuous numerical value.

A. Digital Communication: Email Categorization

Objective: Scan incoming email and classify into any of the following categories as appropriate:

Primary [Emails of highest relevance sent only to you]
Social [Emails sent to groups you have subscribed and mostly informational]
Marketing [Commercial/marketing emails you may have subscribed to]
Spam [Spam!]

B. Online financial safety: Detect a Credit/Debit Card Transaction being fraudulent ?

Objective: Reduce fraudulent misuse of credit and debit cards information by unauthorized entities.

In 2022, merchants and card owners have lost over 30 billion USD, worldwide, in debit and credit card fraud. As the card issuing FSI company (or consortium) it is imperative to detect and prevent fraudulent transactions pre-emptively to safeguard customers, prevent losses to the merchant and the card issuing company, and to maintain brand loyalty with trust. ?

C. Health and Well Being: Detection and prevention of Cardiovascular diseases

Objective: At an individual and community level identify risk of heart diseases

As per World Health Organization - Cardiovascular diseases (CVDs) are the leading cause of death globally. In 2019, 32% of all global deaths were due to CVD of which 85% were due to heart attack and stroke.

With the power of AI - we can process large volumes of individual and demographic data managed by health organizations to identify factors responsible, having highest impact,?so that awareness can be built and preventive measures can be encouraged.

Moving beyond Linear Regression

The algorithms based on linear regression alone may not provide the most efficient models in many scenarios including the ones noted above. For example - while measuring the probability of an event happening - a negative value or greater than 1 would be invalid that we can get when linear regression algorithms are applied (that worked well for the example of impact of multi channel advertising spend on sales, covered in Article 1)

Image courtesy: Stanford University course textbook - An Introduction to Statistical Learning, With Applications in Python (ISLP)

Refer to the figure above with two graphs. Lets consider an example of finding the probability of a credit card holder defaulting on payment (Y axis) considering the outstanding balance on the card due for payment (X axis).

Linear Regression equation applied to Probability. Refer to image on the left above.

If we apply linear regression on the values (left image) to find probability of a person defaulting - we could get a negative value or a value greater than 1. That is not what is expected for probability (should be from 0 to 1). ?

Logistic function for calculating a probabilistic output. "e" is Euler's number. Refer to image on the right above.

However, we solve this problem by shifting to a logistic function (shown above and part of Logistic Regression). Now we always get values between 0 and 1 (right image). There are more equations to solve and transform to understand Logistic Regression but this is just to drive home the point that a different approach is needed for different type of data and outcomes.

An approach that worked for one set of problems may not work for another. Therefore we need to use different models here more suited to classification problems like Logistic Regression or Linear Discriminant Analysis etc. for a better fit.

For example, the classification would need a decision like a Yes or No:

Based on the data is any risk of heart disease detected? Yes/No.

What is the probability of the current transaction being fraudulent? etc.?

The analysis may lead to a likelihood of the result falling into a certain category if it crosses a certain threshold we set based on prior knowledge.

Branches of Mathematics - Building blocks of ML

So what are the "must-know" branches of mathematics that form the building blocks of machine learning and AI systems?? Interestingly - the mathematical theories and algorithms being used in modern day AI applications range from those proposed nearly a century ago to more recent times.

In order to pursue ML and AI deeper it is highly recommended to get the fundamentals strong in the following branches of mathematics:

Linear Algebra
Statistics
Probability
Calculus???

Mathematics in Action - Applied to ML

From the examples above lets pick one up - Email categorization and touch upon at a high level how these mathematical concepts and various algorithms play together to help build the right model.

Digital Communication: Email Categorization

Objective: Scan incoming email and classify it into appropriate category.?

Key Machine Learning concepts applied:

Classification: To classify emails into predefined categories.
Natural Language Processing (NLP): Emails consist of text, and NLP techniques are needed to process and extract meaningful features from the email content.

Key Mathematical concepts applied

Linear Algebra: Emails are converted into vectors (e.g., using TF-IDF that stands for term frequency–inverse document frequency, word embeddings). Matrices represent these vectors and help in text classification.
Probability: Probabilistic models like Naive Bayes classify emails based on the probability of certain words or phrases being associated with a category.
Statistics: Used to measure the accuracy of the classifier and understand the distribution of features (e.g., word frequencies).

Logic applied: ?

David Bray, PhD 6 个月前

Locating ROI in Iris Using Randomized Hough Transform

Delphic (South Asia) 1 年前

5 Movies That Every Data Enthusiast Should Watch

LEJHRO 2 年前

Feature Extraction: Convert email text into a vectorized format (e.g., bag of words, TF-IDF, or word embeddings).
Training a Classifier: Use labelled email datasets (primary, social, spam, marketing) to train a supervised learning model.
Prediction: Based on email content, the model predicts the most likely category.?

Algorithms applied: ?

Naive Bayes: A probabilistic classifier that works well with text data by leveraging word frequencies.
Random Forest: A robust model for classification that combines decision trees. Content could be filtered and grouped before being analyzed by SVM.
Support Vector Machines (SVM): A powerful classifier that finds the hyperplane separating different email categories. Imagine a lot of data scattered across space. You need to draw lines that best separate data into distinct categories. This approach helps do so working with other algorithms to analyze and group data.
Deep Learning (LSTM - Long Short-Term Memory, CNN - Convolutional Neural Network): Can be used for complex email content and classification with large datasets. ??

Here Feature Extraction relies majorly on Linear Algebra (Vectorization), Classification relies majorly on Probability (Naive Bayes) and Model Performance relies majorly on Statistics (Precision, Recall, Accuracy)

Artificial Neural Network key components. Linear Algebra and Gradient Descent.

The diagram above that I created represents a Artificial Neural Network (ANN) model that shows its parameters and hyper parameters that we need to tweak, optimize to get maximum accuracy. It also shows how mathematical concepts we touched upon are applied - in this case matrices (linear algebra) and gradient descent (calculus). The key elements of this ANN shown are the parameters (weights and biases) and hyper-parameters.?? The data is divided into training, validation and test sets (usually a 80:10:10 mix). We start with forward propagation from the input variables. In this scenario there are 3 inputs (independent variables) and the final output is a classification into two categories. A Batch is a subset (sample) of the total training data propagated through the network in a single pass. Every consecutive batch will have the parameters adjusted to reduce error. The number of times the training dataset is passed through the ANN is called as an Epoch. An epoch can have one of more batches. There are multiple epochs run across all of the training data. The training process completes when all of the epochs have completed.

The initial weights could be initialized randomly with values from a standard normal deviation (mean = 0 and standard deviation =1). The matrix multiplication and addition in this diagram shows how these weights and biases are used in the algorithm (WT * X + B = Y)? when processed for the 1st and 2nd hidden layers.

The matrix output for each layer is fed into an activation function that filters and reduces noise and determines what information is propagated to the next layer. The type of activation function is decided based on the context of the problem being solved.

Some of the activation functions, for example are: ?

Sigmoid: output is 0 to 1

TanH: output is -1 to 1

Rectified Linear Unit (ReLU):? 0 if x <0

Softmax: Vector of probabilities that add up to 1?

For usage example: In case of the email classification problem, we could use the "softmax function" that determines what's the probability of an email belonging to a specific category (Primary, Social, Marketing, Spam). We run the iterations using forward propagation. The category with the highest probability for an email will decide under which category the email is assigned to. To determine the error and accuracy during each iteration we use an algorithm most suited for the problem being solved. In this scenario of multi-class email classification, a cost function called "Categorical Cross Entropy" would be ideal. Based on the error value - back propagation is used to adjust the weight and improve the accuracy. In the diagram above we would be moving towards the left for back-propagation against the direction of the arrows to fix the errors and adjust the weights.

This process of forward propagation, application of activation function, error estimation and back propagation to improve accuracy - is called Gradient Descent process and uses Calculus as the core branch of Mathematics.

This is a simplistic representation without getting into greater details of mathematics, Neural Networks and Deep Learning that would require a lot more time and rigor and is often a dedicated term subject in a formal course.

Access to Computing Power

The computing power available today makes testing on large datasets possible that was not possible with only a few CPUs to start with. With the rapid advancement of technology to process larger amounts of data faster - datacenters around the world are being redesigned and revamped with modern day architectures with the computing power made possible by running thousands of GPUs and DPUs.

Large Language Models (LLMs) and Large Multimodal Models (LMMs) today are possible due to more computing power available than ever before.

We are quickly developing in parallel LMMs that process massive amount of data that include text, image, video, audio (examples: OpenAI GPT-4o "omni", Gemini 1.5 from Alphabet) and also sensory data (device inputs with touch, gyro, navigation etc. data) going beyond LLMs like GPT-4 that process massive amount of data that is primarily text-based.

The deep learning neural networks for GPT-3 has over 175 billion machine learning parameters and GPT-4 has 1.7 trillion parameters.

With the computing power available today we can apply all of the mathematical algorithms to large volumes of data and perform the compute intensive validations to determine the most efficient models.

In programming languages like Python, all of the calculations can be performed with ease using the wide array of libraries available today. As the features and parameters increase you will need higher processing power beyond that of your personal computer (as we move from 10s of Millions to billions to trillions of data points forming LLM and LMM) where you will use dedicated servers running on thousands of cores (CPUs, GPUs, TPUs to DPUs) on data centers built for such purpose.

For example with a large dataset when we use Cross Validation (CV) techniques (for example K-Fold CV or Leave One Out CV) instead of using just the initial training and test data split, we split the data into many folds and iteratively run through them marking certain data as training data and other as test data to get more accurate predictions because of a diverse data set. This needs much higher computational power than your laptop or desktop.

In such cases cloud computing resources or a data center designed for such purposes is best suited. Some examples of a data center with powerful processing architectures today may involve NVIDIA A100 Tensor Core GPU, NVIDIA H100 Tensor Core GPU, AMD Instinct MI250X, Google TPU v4, NVIDIA BlueField-4 DPU.

If you want to play with large datasets that need computing power beyond your personal computers at home, consider leveraging the cloud computing power available to you. Some of the providers are (both free and paid plans based on the usage limits and computing resources you need) Kaggle.com, Colab.Google, Lightning.ai, Brev.dev

I ran a code [dataset and code courtesy: Kagggle.com] on colab.google to find which model is most suitable for spam detection and got these results:

The Ground We Covered

In this article we expanded our horizons to go deeper and wider into the field of ML and applications in AI systems going through some real life problems, mapping them to the branches of mathematics that are the building blocks of ML, going beyond linear regression into classification based models, understood the key terminology used with neural networks, key activation functions, considered an example to see how all the concepts are applied, understood the computing power needed and available today to run LLMs and LMMs and cloud platforms where you can open your own free or paid account to get hands-on and run your own ML code, on processors much more powerful than your personal computers.

In the next article we will debate about the areas where AI can play greater role performing possibly better than humans, the challenges that AI will face, and the importance of the Human-AI collaboration.

These views are personal. Just like in statistics, please allow for a margin of error, though efforts have been made to minimize them. These articles are intended to spark the interest of general readers in statistics, machine learning, and AI.

Acknowledgement:

University of Stanford Course material on Statistical learning
University of Stanford textbook: An Introduction to Statistical Learning with applications in Python by professors Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, Jonathan Taylor
Numerous LinkedIn Learning courses on GenAI, Deep Learning and Mathematics for Machine Learning.
Email classification for detection of Spam Email (2 outputs only - spam or not) - Dataset and code courtesy Kaggle.com and I ran the code on my colab.google account.
Key python modules used for Email classification solution example shown (code executed on colab.google) are: pandas, numpy, matplotlib, BeautifulSoup, wordcloud, re, textblob, nltk, emoji, sklearn, re, xgboost. This could vary based on your implementation and was a simplistic approach.

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

2 个月

The emphasis on real-world applications and diverse algorithms suggests a practical approach to machine learning education. Logistic regression and neural networks are foundational, while generative AI's inclusion reflects the field's current trends. How might these concepts be applied to personalize educational content within a large online learning platform?

查看更多评论

要查看或添加评论，请登录

查看全部

Article 2 - The Predicament of Predictors

Rommel Sharma

Go To Market Services Lead - APAC. Adobe Professional Services.

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Young Frankenstein (or is it Frankenstream or Frankenscheme?) and the AI Revolution

Most Swans are White: Living in a Predictive Society

Rabbit Holes are for Rabbits, not Effective Analytical People by Marc LeVine

From Exploration to Business Impact: The Maturity of Data Science and AIML

alwaysAI Insider, vol 39: ?? How Practical Is Computer Vision? Let’s Play the Game Show!

Mind candy 15 December 2021

Risk will be Assimilated by AI, Robos First. Resistance is Futile.

Who do You Think You Are...?

collapse spectatoor 006: Infohazards

#0b110: one cubic millimetre of the brain mapped and Data Colada in court

领英推荐

Article 1 - The Predicament of Predictors

2024年8月18日

Marketing Attribution Analytics @ Adobe Symposium, Mumbai.

2019年4月16日

The Adobe Digital Learning Services Experience

2019年3月6日

The Nobel Prizes that go into your Digital Camera

2018年8月3日

The Turnaround Story of 7-Eleven and Management lessons from Japan

2018年3月19日

Stock Analysis using Financial Ratios

2018年2月26日

In admiration of Amazon - Management Theories in Practice

2018年2月25日