Complexity: Time, Space, & Sample
Complexity: Time, Space, & Sample
In the article “Which Machine Learning (ML) to choose?" [1] , as part of the "Architectural Blueprints—The “4+1” View Model of Machine Learning ," which helps you to choose the right ML for your data, we indicated that “From a business perspective, two of the most significant measurements are accuracy [2] and interpretability." [3] However, we did not discuss computational complexity.
ML Architectural Blueprints = {Scenarios, Accuracy, Complexity, Interpretability, Operations}
Computational complexity of an algorithm is a fundamental concept in computer science. It is necessary to take computational complexity into account because it affects the amount of resources required to run your model. These required resources are estimated for time, space, and sample complexities.
These estimated computational complexities are approximated for a variety of different machine learning algorithms during training and prediction.
Note: Computational complexity and ML model complexity are distinct concepts, though they are often related.
Computational Complexity
ML Model Complexity
Relationship between the two:
In summary, while computational complexity is about the efficiency of the algorithm, ML model complexity is about the model's ability to learn and generalize from the data. Understanding both concepts is essential for building effective and efficient machine learning systems.
Definitions: Time, Space, & Sample
Computational Complexity of ML. Diagram: Visual Science Informatics
Relationships between AI, ML, NN, and DL. Diagram: Dirk Zee
The diagram above depicts the relationships between artificial intelligence, machine learning, neural networks, and deep learning include MLP: multilayer perception; CNN: convolutional neural network; RNN: recurrent neural network; DBN: deep belief network; and GAN: generative adversative network.
Artificial Intelligence (AI) is the broadest term, encompassing the creation of intelligent agents. AI is the overarching goal of creating intelligent machines.
Machine Learning (ML) is a subset of AI that focuses on algorithms allowing computers to learn from data without explicit programming. ML is a computational method to achieve AI by enabling computers to learn from data.
Neural Networks (NNs) are a type of ML algorithm inspired by the human brain's structure. They consist of interconnected nodes (artificial neurons) that process information. NN are a specific type of ML algorithm.
Deep Learning (DL) is a subset of ML that uses neural networks with multiple layers (deep networks) to learn complex patterns in data. DL is a powerful technique within neural networks for handling complex data.
Brief History of Artificial Intelligence (AI), Machine Learning (ML), & Neural Networks (NNs)
History of Artificial Intelligence (AI), Machine Learning (ML), and Neural Networks (NNs). Diagram: Visual Science Informatics
The brief history diagram above of Artificial Intelligence (AI), Machine Learning (ML), and Neural Networks (NNs) illustrates the evolution of these fields. It also captures the causes and effects on the cyclical nature of their development, including periods of rapid advancement (waves) and subsequent slower progress or decline (winters).
Artificial Intelligence (AI)
The concept of AI can be traced back to ancient myths and legends, but the modern field of AI emerged in the mid-20th century.
Machine Learning (ML)
ML is a subset of AI that involves teaching computers to learn from data and improve their performance over time without being explicitly programmed. ?
Neural Networks (NNs)
NNs are a type of ML algorithm inspired by the structure and function of the human brain.
Key milestones and breakthroughs
Overview of ML Data-Driven Methods & Algorithms
The diagram below provides an overview of ML data-driven methods categorization for such as Artificial Neural Network (ANN), Decision Tree (DT), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Generalized DBSCAN (GDBSCAN), Gaussian Means (GM), Hierarchical Clustering (HC),?k-Nearest Neighbors (kNN), Random Forest (RF), Support Vector Machines (SVM), and eXtreme Gradient Boosting (XGBoost).
Types of ML Methods & Algorithms. Diagram adapted: Aristek System
ML Algorithms. Table: George Firican
Common ML Algorithms. Diagrams: datasciencedojo
Selecting a logical learning paradigm or a computational method has four primary categories, four major algorithm types, and two major techniques. The four major categories are supervised, semi-supervised, unsupervised, and reinforcement. The four major algorithm types are classification, regression, associations, and clustering. The two techniques are ensemble methods and reward feedback. The chart “Which Machine Learning (ML) to choose?”, at [Scenarios: Which Machine Learning (ML) to choose?], guides you through the major categories, data types, and objectives of which algorithm types or techniques to choose.
Additionally, complexity increases when adding model-agnostic explainability methods. [Interpretability: “Seeing Machines Learn”]
Moreover, complexity affects ML Operations (MLOps) and Continuous ML (CML), which are a set of practices that aims to deploy and maintain machine learning models in production reliably and efficiently. [Operations: MLOps, Continuous ML, & AutoML]
Furthermore, complexity is affected by the quality of your data, a good data quality, is a necessary prerequisite to building your ML model. [Data Science Approaches to Data Quality: From Raw Data to Datasets]
Time & Space: Big O Notation
"Big O notation is used to find the upper bound (the highest possible amount) of a function's growth rate, meaning it works out the time of the longest route it will take to turn input into output. It is often used to compare the efficiency of different algorithms, which is done by calculating, in a worst-case scenario, how much memory is needed, and how much time it takes to complete." [Wikipedia]
"Calling n the number of training sample,
p the number of features,
n(trees) the number of trees (for computational methods based on various trees),
n(sv) the number of support vectors, and
n(li) the number of neurons at layer i in a neural network,
we have the following approximations." [thekerneltrip]
The enclosed table proposes a theoretical point of view of some algorithms’ upper bounds.
Computational Complexity of Machine Learning Algorithms. Table: thekerneltrip 2018
All these estimated computational complexities with assumptions are approximated for a variety of different machine learning algorithms during training time, prediction time, and auxiliary space complexity. These are computational complexity of ML algorithms, where:
n = number of training examples,
m = number of features,
n' = number of support vectors,
k = number of neighbors,
k' = number of trees.
Computational Complexity of ML Algorithms. Table: Krish Naik
These are computational complexity of ML algorithms, where:
- n is the size of the samples [training] set,
- m is the number of dimensions [features],
- epoch refers to the one entire passing of training data through the algorithm,
- c is the number of classes,
- d(tree) is the death of a tree,
- n(sv) is the number of support vectors,
- k is the number of clusters, and
- i is the number of iterations.
Training and Inference Time Complexity of ML Algorithms. Table: Avi Chawla
Abbreviations: Convolutional Neural Networks (CNN); Density-Based Spatial Clustering of Applications with Noise (DBSCAN); K-Nearest Neighbors (KNN); Non-negative Matrix Factorization (NMF); One-vs-Rest (OvR); Ordinary Least Squares (OLS); Principal Component Analysis (PCA); Recurrent Neural Networks (RNN); Stochastic Gradient Descent (SGD); Support Vector Machines (SVM); and t-distributed Stochastic Neighbor Embedding (t-SNE)."
Also, if you deploy your model within ensemble methods, then they multiply the complexity of your model. A lower computational complexity of a ML algorithm is significant for optimizing the efficiency of other ML computational methods such as Bagging, Boosting, Stacking, and Cascading ensembles. Because ensemble methods require input results from numerous ML algorithms.
ML Algorithms for Prediction. Table: AISOMA
The table above captures the advantages and disadvantages of top machine learning algorithms for prediction, which are linear, tree-based, and neural networks.
Execution Speeds Dashboard. Chart: KNIME AG
For example, an AutoML model dashboard can visualize execution speed, which is measured during two stages: training time and deployment speed. Training time, typically in seconds, reflects how long it takes to train the model on the data. Deployment speed, measured in milliseconds, focuses on the average time to make a single prediction on new data. This highlights how fast the model can process one new input. Interactive features allow users to adjust plot settings and explore data dynamically, providing valuable insights into both training efficiency and deployment performance of the machine learning model.
领英推荐
Mathematical Formulas
Understanding the mathematical formulas of machine learning algorithms is instrumental in gaining insights into algorithms computational complexity of machine learning algorithms.
Mathematical Formulas of ML Algorithms. Table: Bindu Reddy
Mathematical Formulas of Regression Algorithms. Table: Avi Chawla
Sample: Sampling & Fundamental Concepts
"Sample complexity of a machine learning algorithm represents the number of training samples it needs in order to successfully learn a target function." The two main categories are probability sampling and non-probability sampling methods. In probability sampling, every member of the target population has a known chance of being included. This allows researchers to make statistically significant inferences about the population from the sample. Non-probability sampling methods are used when it's not possible or practical to get a random sample, but they limit the ability to generalize the findings to the larger population.
Sampling Methods and Techniques. Diagram: Asad Naveed
Monte Carlo and Markov Chain Monte Carlo (MCMC) Sampling Methods
Both Monte Carlo and Markov Chain Monte Carlo (MCMC) are powerful techniques used to generate samples from complex probability distributions that are often used in Bayesian inference. The main difference between Monte Carlo and MCMC is that Monte Carlo methods use random sampling to generate independent samples, while MCMC methods use dependent samples to generate sequences of dependent observations. While Monte Carlo methods can be used for direct sampling in some cases, MCMC methods often provide a more efficient and flexible approach, especially for complex distributions.
Monte Carlo Methods
Markov Chain Monte Carlo (MCMC) Methods
Key Differences between Monte Carlo vs. MCMC. Table: Gemini
In Conclusion, both methods are valuable tools for statistical inference and machine learning, but their applicability and efficiency can vary depending on the specific problem and the complexity of the target distribution. The choice of method depends on the complexity of the distribution and the desired level of accuracy. MCMC methods are particularly powerful for handling complex distributions and high-dimensional problems.
Entropy in Information Theory
When discussing the learnability of models and the sample complexity required for effective training, Entropy in information theory, Probably Approximately Correct (PAC) learning theory, and VC (Vapnik-Chervonenkis) dimension are fundamental concepts.
Here is a breakdown of how it works:
For example, if you have a dataset of emails classified as spam or not spam. At a particular node in the decision tree, the entropy might indicate that on average, you would need to ask about two additional features (such as presence of certain keywords) to correctly classify an incoming email as spam or not spam.
Here is the formula for calculating entropy in information theory:
Entropy = -Σ(p(i) * log2(p(i)))
Where:
The outcome is in bits, representing the average information needed per classification.
-??????Probably Approximately Correct (PAC) Learning Theory
- ε (epsilon): This is the desired error tolerance. We want the model's generalization error (error on unseen data) to be within ε of its training error (error on the training data) with high probability.
- δ (delta): This is the confidence parameter. We want this level of confidence (1 - δ) that the model's generalization error is within ε of the training error.
-??????VC (Vapnik-Chervonenkis) Dimension
There is a connection between VC dimension and sample complexity in PAC learning. Intuitively, learning a more complex concept class (higher VC dimension) requires more training data to achieve the desired accuracy with high probability (as specified by ε and δ in PAC learning).
PAC learning theory provides bounds on the number of training samples (sample complexity) needed to learn a concept class with a given VC dimension within the desired accuracy (ε) and confidence (δ).
In simpler terms, if a concept class has a lower VC dimension (less complex), it can be learned with a smaller training dataset to achieve the same level of performance compared to a class with a higher VC dimension.
VC dimension helps us understand the inherent complexity of a model class, while PAC learning provides a framework to analyze how much data is needed to train a model effectively with some guarantees on its generalization performance.
Generative Models Complexity
The roots of generative AI can be traced back to the early days of AI research. While the term "Generative AI" is relatively new, the underlying concepts and techniques have been evolving for decades. Here are some key milestones:
Where did Generative AI come from? Chart: Laurie McCabe
Generative models understand how data is embedded throughout space and generate new data points. A Discriminative model focuses on a solution and performs better for classification tasks by dividing the data space into classes by learning the boundaries.
Discriminative vs. Generative. Table: Supervised Learning Cheatsheet
Deep Learning Algorithms. Table: George Firican
A full Transformer contains both an encoder and a decoder. Diagram: Google
Transformer Model Architecture. Diagrams: Ashish Vaswani et al.
Big O for LLMs
"Self-attention forces every word in the context to learn the relevance of all the other words in the context. So, it is tempting to proclaim this an O(N2) problem, where:
As if the preceding Big O were not disturbing enough, Transformers contain multiple self-attention layers and multiple self-attention heads per self-attention layer, so Big O is actually:
O(N2 · S · D)
where:
Large Language Model (LLM) is a type of generative model that excels at understanding and generating human language. LLMs are trained on massive amounts of text data, allowing them to process information, respond to questions, and even create different creative text formats of text content. Attention is a technique in neural networks that allows the model to focus on specific parts of its input data. Attention mechanism is focusing on the important bits. It is inspired by the human ability to selectively concentrate on relevant information while ignoring irrelevant details. Comparing LLMs complexity requires granular parameters of time, space, and sample. Comparing the complexity of different LLM models can be based on:
Dataset Size:
Tokens:
Compute Size:
Model Size:
Estimating Word Count (Indirectly):
Understanding computational complexity helps you choose an efficient machine-learning algorithm for your problem, considering non-functional factors such as required response time, memory and storage size, and dataset sample size.
Sorting Algorithms' Computational Complexity
An example from the domain of sorting algorithms illustrates the impact of computational complexity on algorithms' efficiency. "Efficient sorting is consequential for optimizing the efficiency of other algorithms, such as search and merge algorithms, which requires input data to be in sorted lists."
The following animations illustrate how effectively data sets from different starting points can be sorted using different algorithms.
Sorting Algorithms. Animation: Toptal
Statistical Machine Learning (SML)
Statistical Machine Learning (SML) merges statistics with the computational sciences: computer science, systems science, and optimization. SML provides mathematical tools for analyzing behavior and the generalization performance of ML algorithms. “The major difference between machine learning and statistics is their purpose. Machine learning models are designed to make the most accurate predictions possible. Statistical models are designed for inference about the relationships between variables." [5]
Algorithms of ML. Mind map: Jason Brownlee [22]
Next, read the "Interpretability: “Seeing Machines Learn - Understanding models' prediction/decision reasoning/transparency: Why? – Trust; How? ” article at https://www.dhirubhai.net/pulse/interpretabilityexplainability-understanding-models-rajwan-ms-dsc .
---------------------------------------------------------
Must read the “Computational complexity of machine learning algorithms” at?https://www.thekerneltrip.com/machine/learning/computational-complexity-learning-algorithms. Also, an excellent in-depth read of “The Future of Computation for Machine Learning and Data Science” at?https://towardsdatascience.com/the-future-of-computation-for-machine-learning-and-data-science-fad7062bc27d
Read the "Interpretability/Explainability: Understanding models' prediction/decision reasoning/transparency: Why? – Trust; How? – “Seeing Machines Learn” article at?https://www.dhirubhai.net/pulse/interpretabilityexplainability-understanding-models-rajwan-ms-dsc