There is no truth in data, which truth would you like?
The Thinker - Auguste Rodin

There is no truth in data, which truth would you like?

Everything In Its Right Place?– Radiohead?

Lies, damned lies, and statistics?– unconfirmed (often attributed to Benjamin Disraeli or Mark Twain)?

I wish I had an answer to that because I'm tired of answering that question?– Yogi Berra?

Imagine a theoretical data universe of everything - all atomic elements; all subjects (objects) and their complete description (all dimensions). These subjects and descriptive data particles sit within a conceptual time-space fabric that positions and connects each, therefore, allowing all to be related -?everything in its right place. This data assembly is un-opinionated and without exclusion. It presents the ultimate starting point for all downstream manipulation, analysis and interpretation.?

The theoretical data universe may never be captured as a whole: there is a lot of data out there. However, the fabric to hold it already exists. At Harness, we call this platform Data Fabric.?

So what about the lies, damned lies and statistics? It is the usual process of making data work for a single use case. Let’s take property valuation as an example.?

We can take a ‘universe’ of property data with a range of known prices and relate them to descriptive dimensions such as point in time, location, size or age. Each statistically significant dimension contributes to the price level to some degree and so numerical predictors can be modelled to estimate value relatives. Where a single price estimate is required, algorithmic methods produce ‘shortlisted’ comparables having a range of prices from which a central measure might be calculated. This value is the end point, so to speak, the final output from the initial ‘universe’.?

But this is only one specific set of steps against a general framework approach. Which central measure do you take? How many records make the shortlist? What cleansing operations are applied to remove prices that are potentially outliers, or erroneous? What new or evolved algorithm could or should be applied? What limits are adopted that confirm significance? How is missing data handled? There will be an optimal route for the given conditions, nonetheless, all these sub-steps can be tweaked to produce a different outcome and conditions always change. So, which value (truth) would you like? And how do you want to further manipulate these numbers over time to tell a story of the market going up, down or sideways? What or whose definition of market??

The whole process from start to end can be viewed as an attempt to create or find a truth and there simply isn’t one. Every single opinion superimposed on the data is a form of limit.

The Data Fabric paradigm recognises that there is no single truth. Nor is there a universal prescription to identify and resolve outliers, errors, inconsistencies, conflicts, wrong estimations for and across all elements and, crucially, still be able to serve any downstream application. This paradigm promotes full data consumption by connecting everything (pretty much) and changing nothing (pretty much). The connections are achieved through a grid of positions, against which the data is placed. As all data is included, each data element has equal value and so there is no uncertainty and no opinion. So-called universal identifiers are simply atomic data elements now connected into and across all other keys. This gives the most upstream view of the data possible not limited by any perspective i.e. the ultimate starting point.?

Not only can you do all the usual manipulation of multiple datasets to create a single answer, such as a property value estimate, but you can also go the other way and search for an address, or a location ID or a company number or a title number and the system will return an explosion of data related to that unit input. The ‘answer’ is complete at each position. Though it then needs to be shaped into an answer (story) relative to the question being asked.?

If your data assets starting point is already set up to answer only one question, then there is only one answer for you. That might be okay for your business right now but not okay when we want to assemble the data so it can be fully utilised in any given situation. We believe there is an opportunity to find the optimal outcome for any business or data user by having the greatest leverage. This untouched, all-encompassing view cannot be known as such, rather we can seek out multi-sourced insights from deep data connections which could be very powerful for those who wish to view previously limited data in their industry as part of a whole. Researchers will also benefit from new depth across an otherwise narrow view.?

To learn more about HARNESS Data Intelligence, please visit?www.harnessdata.ai?

Absolutely. The data space should always model for how the data is - it’s properties and relationships - it’s position in whatever n-dimensional space it truly exists within. Too often data is only considered from the perspective of how it’s used today. There’s a place for that - optimization and simplifying - but if that’s your base layer, it’s incredibly limiting.

回复
Josue Calvo

Director of Applied Intelligence at Harness Data Intelligence Ltd

2 年

Beautifully expressed Lee Mollins! I can’t believe we have built such a powerful system, but it is true! I’d love to see how other people use it. Exciting times ahead!

要查看或添加评论,请登录

HARNESS Data Intelligence的更多文章

社区洞察

其他会员也浏览了