Part 2 - Keep it Simple : Machine Learning & Algorithms for Big Boys
Dr.Dinesh Chandrasekar (DC)
Chief Strategy Officer & Country Head, India, Centific AI | Nasscom Deep Tech ,Telangana AI Mission & HYSEA - Mentor & Advisor | Alumni of Hitachi, GE & Citigroup | DeepTech evangelist |Author & Investor| Be Passionate
Part 1 of this Article :Click
The Picture below summarizes the Machine Learning Algorithms in one Picture
Parameters
Parameters are the knobs a data scientist gets to turn when setting up an algorithm. They are numbers that affect the algorithm's behavior, such as error tolerance or number of iterations, or options between variants of how the algorithm behaves. The training time and accuracy of the algorithm can sometimes be quite sensitive to getting just the right settings. Typically, algorithms with large numbers parameters require the most trial and error to find a good combination.+
While this is a great way to make sure you've spanned the parameter space, the time required to train a model increases exponentially with the number of parameters.The upside is that having many parameters typically indicates that an algorithm has greater flexibility. It can often achieve very good accuracy. Provided you can find the right combination of parameter settings.
Number of features
For certain types of data, the number of features can be very large compared to the number of data points. This is often the case with genetics or textual data. The large number of features can bog down some learning algorithms, making training time unfeasibly long.
Algorithms
A brief look into some of the algorithms for us to understand a bit more on the feasibility of these algorithm in real life use cases
Linear regression
As mentioned previously, linear regression fits a line (or plane, or hyperplane) to the data set. It's a workhorse, simple and fast, but it may be overly simplistic for some problems.
Data with a linear trend
Logistic regression
Although it confusingly includes 'regression' in the name, logistic regression is actually a powerful tool for two-class and multiclass classification. It's fast and simple. The fact that it uses an 'S'-shaped curve instead of a straight line makes it a natural fit for dividing data into groups. Logistic regression gives linear class boundaries, so when you use it, make sure a linear approximation is something you can live with.
A logistic regression to two-class data with just one feature - the class boundary is the point at which the logistic curve is just as close to both classes
Trees, forests, and jungles
Decision forests (regression, two-class, and multi class), decision jungles (two-class and multi class), and boosted decision trees (regression and two-class) are all based on decision trees, a foundation machine learning concept. There are many variants of decision trees, but they all do the same thing—subdivide the feature space into regions with mostly the same label. These can be regions of consistent category or of constant value, depending on whether you are doing classification or regression.
A decision tree subdivides a feature space into regions of roughly uniform values+
Because a feature space can be subdivided into arbitrarily small regions, it's easy to imagine dividing it finely enough to have one data point per region. This is an extreme example of overfitting. In order to avoid this, a large set of trees are constructed with special mathematical care taken that the trees are not correlated. The average of this "decision forest" is a tree that avoids overfitting. Decision forests can use a lot of memory. Decision jungles are a variant that consumes less memory at the expense of a slightly longer training time.
Boosted decision trees avoid overfitting by limiting how many times they can subdivide and how few data points are allowed in each region. The algorithm constructs a sequence of trees, each of which learns to compensate for the error left by the tree before. The result is a very accurate learner that tends to use a lot of memory.
Fast forest quantile regression is a variation of decision trees for the special case where you want to know not only the typical (median) value of the data within a region, but also its distribution in the form of quantiles.
Neural networks and perceptrons
Neural networks are brain-inspired learning algorithms covering multiclass, two-class, and regression problems. They come in an infinite variety, but the neural networks within Machine Learning are all of the form of directed acyclic graphs. That means that input features are passed forward (never backward) through a sequence of layers before being turned into outputs. In each layer, inputs are weighted in various combinations, summed, and passed on to the next layer. This combination of simple calculations results in the ability to learn sophisticated class boundaries and data trends, seemingly by magic. Many-layered networks of this sort perform the "deep learning" that fuels so much tech reporting and science fiction.
This high performance doesn't come for free, though. Neural networks can take a long time to train, particularly for large data sets with lots of features. They also have more parameters than most algorithms, which means that parameter sweeping expands the training time a great deal. And for those overachievers who wish to specify their own network structure, the possibilities are inexhaustible.
The boundaries learned by neural networks can be complex and irregular
The two-class averaged perceptron is neural networks' answer to skyrocketing training times. It uses a network structure that gives linear class boundaries. It is almost primitive by today's standards, but it has a long history of working robustly and is small enough to learn quickly.
SVMs
Support vector machines (SVMs) find the boundary that separates classes by as wide a margin as possible. When the two classes can't be clearly separated, the algorithms find the best boundary they can. As written in Machine Learning, the two-class SVM does this with a straight line only. (In SVM-speak, it uses a linear kernel.) Because it makes this linear approximation, it is able to run fairly quickly. Where it really shines is with feature-intense data, like text or genomic. In these cases SVMs are able to separate classes more quickly and with less overfitting than most other algorithms, in addition to requiring only a modest amount of memory.
A typical support vector machine class boundary maximizes the margin separating two classes
Bayesian methods
Bayesian methods have a highly desirable quality: they avoid overfitting. They do this by making some assumptions beforehand about the likely distribution of the answer. Another byproduct of this approach is that they have very few parameters. Machine Learning has both Bayesian algorithms for both classification (Two-class Bayes' point machine) and regression (Bayesian linear regression). Note that these assume that the data can be split or fit with a straight line.
PCA-based anomaly detection - the vast majority of the data falls into a stereotypical distribution; points deviating dramatically from that distribution are suspect
I would definitely recommend to check some more article on the Machine Learning and related data science concepts. I am exploring currently Microsoft Azure Machine Learning and will share more on this one in next few weeks. Whats in store for us , Quick preview picture below
Regards
Dinesh Chandrasekar DC*
Advisor - Data Science based Healthcare Transformation at Ingine Inc
7 年Advancement to Bayesian machine learning for Knowledge Extraction https://www.dhirubhai.net/pulse/bioinginecom-hdn-semantic-knowledge-general-graph-best-boray?trk=v-feed&lipi=urn%3Ali%3Apage%3Ad_flagship3_search_srp_content%3BpS9wtx5RypEul610VF7rlQ%3D%3D
Advisor - Data Science based Healthcare Transformation at Ingine Inc
7 年Great Primer.... Chekout https://www.bioingine.com/?page_id=1274
Sustainability Adviser
7 年Am I allowed read it? (I am not a boy)
Devsecops, Threat Intelligence and Automations.
7 年Awesome article. Well written, covering many points.
Big Data Analytics & IA | Data Scientist
7 年Thanks!!