Expect More Sparse Data
Alex Jouravlev
Data and Enterprise Architecture veteran and practitioner with up to date strategic knowlege and hands-on skills in AI. Proponent and enabled of Data-Driven Enterprise. Everything Graph and Metadata
One of the arguments for NoSQL Databases, along their ability to handle Big Data, is their ability to handle sparse data. Sparse data is data that if loaded into a relational DB will contain too many nulls – for any record, most of the columns will have no values. In a situation like that, representing data as a set of ML/JSON documents makes more sense than relational representation.
Data are naturally sparse. Say, I have a Strava profile (that says that I’ve ridden 0km this month). A boxer would not have Strava data but will have a boxing record. Other people will have something else. Most of the information associated with us is naturally sparse.
We reason we haven’t been talking about sparse data for that last three decades is because we decided to ignore it. We only use information we really, obviously need – and we create additional tables to unload information that is uncommon.
However the new era of Big Data clearly taught us the value of knowing more. Expect public and private enterprises to try to source more data from huge variety of sources. More information than the number of tables we are prepared to create and maintain. Therefor we will want to work with sparse data in a document-oriented DB.
Was originally published on https://www.businessabstraction.com/2014/07/expect-more-sparse-data/
Principal Consultant Power BI
10 年Hence there is a whole range of techniques of filling empty data with "what this data should have been".