Topological Data Analysis: How to Make Sense of Big Data without Hiring Numerous Analysts
Data is crucial. It’s what helps business runners make right decisions. It’s what they use to prevent fraud, determine clients’ behavioral patterns and make accurate financial forecasts. Companies don’t make the slightest business move now without studying thoroughly their data, and yet, according to the experts in Topological Data Analysis (TDA), there are still ways to exploit it more.
What is Topological Data Analysis?
TDA, which originates from mathematical topology, is a discipline that studies shape. It’s concerned with measuring the shape, by means applying math functions to data, and with representing it in forms of topological networks or combinatorial graphs.
It has proven highly efficient for analyzing large, highly dimensional and feature-rich data as it displays shape-related properties, such as presence or absence of loops within data sets, and thus drastically helps analysts to discern key patterns.
Currently, TDA is of an acute interest to both scientists and entrepreneurs around the world. It seems set to change how we, as humans, perceive and understand data. The shape representations it provides either draw summaries out of huge data sets or single out parts of them allowing to interpret data subsets without interfering noise getting in the way.
While displaying data in a succinct way, TDA still retains its important features and relationships between points in data sets. We might say, therefore, that it’s mostly concerned with lossless and clear compression.
In layman’s terms, this is how it works:
Suppose it’s summer and you’re standing at Times Square in New York City on a particularly busy Saturday evening. There are thousands of people around you, all rambling to no end in different languages, yelling over and vigorously interrupting one another. For you, a normal human being with just two ears to grasp sound with, it’s impossible to comprehend everything that’s being said. All you hear is indistinguishable, intense noise.
Now, imagine having a tool (or rather a set of tools) that’s capable of recording all these sounds, of promptly processing them and of getting back to you with quick, informative summaries that retrieve key points from each conversation and indicate where the similarities are between them.
Besides, it gives you an ability to tune into any particular conversation, if you are so inclined, and eliminate all the distracting racket in the surroundings.
Sounds good, huh? Well, that is, in a nutshell, what Topological Data Analysis has been developed for.
Why is Topological Data Analysis so important now?
Two reasons: Big Data and the overall increase of complexity within data sets.
Nowadays, businesses collect more customer info than ever before and as the number of records increases, the number of insights that can be extracted from them grows too, exponentially.
And exponentially also grows a firm’s need to employ data experts.
For bigger companies, that deal with thousands of clients daily, the task of keeping up with their data intakes often becomes overwhelming. They find it neither cost efficient nor physically possible, at times, to hire and train the needed amount of analysts.
That’s why prominent mathematicians from Stanford have come up with this new approach to data analysis. With TDA, and particularly with TDA-powered software, they aim to examine data sets more efficiently, contributing much less time and resources to the process.
How does Topological Data Analysis work?
Getting into exact ways of how mathematical topology is applied to the world of data would not make for an easy read. Trust me.
In fact, all the info on the subject is accessible for free on the web. Ayasdi the only company that’s currently capitalizing on the TDA omits no details concerning the methodologies it uses.
However, unless you’re a trained mathematician; unless you’ve got a firm grasp of Betti numbers, persistent homology, Delaunay triangulation and other advanced algebraic concepts, their documentation will probably make zero sense to you.
So, we’re not going to puzzle you with the intricacies of Ayasdi’s technologies. We’ll walk you through the main three properties of TDA – the features that result in the method’s unprecedented efficacy in terms of Big Data analysis.
#1 Coordinate Invariance
TDA doesn’t concern itself with the properties of shapes that depend on sets of coordinates they’re viewed in or on positioning. TDA is qualitative. It determines features that are insensitive to shape rotations or coordinate system switches; it examines metric spaces - the distances between points within the data sets and so on.
What it all means is this: if you have a certain amount of gold (the amount being our property) you don’t want to get confused over how much it is worth when calculating the cost in different currencies (currencies being systems of coordinates). TDA keeps your focus on the actual value – it ignores small variables such as differences in gold’s price in dollars, pounds, and pesos.
#2 Deformation Invariance
TDA, also, doesn’t take into consideration small geometrical deformations. Properties of shapes are set to be deformation invariant, which allows analysts to keep the noise out while examining them.
To get this, imagine the letter A drawn in different fonts, with various graphical effects added, in different colorings and sizes. No matter which slight modifications it has undergone, your brain will still be able to recognize the letter as long as said deformations have not been substantial (as long as A looks like A and not some other letter).
#3 Compressed representation
The last distinctive property of Topological Data Analysis is that of compressed representation. Rather than displaying a shape in its entirety, which brings lots of complexity to the analysis, TDA offers simple, finite models. It sacrifices small details and, in return, displays data sets in a simplistic comprehensible way, retaining all of its key features.
An example of this could be having a circle shape, which includes in it an infinite number of points and pairwise distances that are an utter hell for us to process, and then turning it into a hexagon. Such representation would reduce the shape down to a set of nodes and edges, which are easy to extract meaning from, while still keeping the circle’s “loopiness”.
Conclusion
Though not that heavily used at present, TDA is on its way to becoming a booming technology. Its wider application, according to the field experts, might bring some groundbreaking changes; it could deepen the automation of logistics and marketing or, even, make software capable of conducting complex research and management – tasks which have been traditionally thought of as requiring a human touch.
We’ve helped Ayasdi, the only company that’s capitalizing on TDA, to build TDA-powered software that’s tailored specifically to the needs of financial data scientists. Want to learn more about it?