登录查看更多内容

NoSQL Will Rule Analytics

Arvind Nanda

Delivery Head at AutoFacets

发布日期: 2018年7月30日

At Gateway our vision is to become the world’s leading intelligent insight mining and real-time collaboration solution enterprise. And the simplest, Simplest to implement, Simplest to manage, Simplest to use, and simplest to do business with.

We leverage disruptive non-relational (NoSQL) database, analytics, machine learning / artificial intelligence (ML/AI) and visualization technologies in entirely new ways to execute on our vision.

At this point you are wondering what this has to do with me – why would I care what a SME Level Analytics company is doing?” Well – if you’ve been working in the Analytics domain for any length of time you know that enterprise analytics is all about managing huge amounts of disparate data from many different sources, i.e. "Big Data" – and that it’s been incredibly difficult and expensive. But it doesn’t have to be.

The bleeding edge work that Gateway is doing with NoSQL can make enterprise analytics exponentially easier and more efficient – which will, in turn, make the enterprise exponentially more profitable. So, if profitability is important to you, here are the top reasons you should care about what Gateway is doing with NoSQL Analytics:

Flexible Data:

Flexible data is what large Enterprises really need – use first, clean later. As we all know, entire industries have been based upon trying to get relational databases to work together. More money has probably been spent on database integration over the last 20 years than anything else in the IT space.

The problem was always the "Transform" part of E.T.L. because relational databases are totally dependent on rigid database designs or “program” they only really work as intended in isolation. Traditionally, trying to integrate two of them in any meaningful way (transactional APIs aside) usually meant creation of a third database with a separate program designed to be a compromise between the two original sources.

Data integrity then needed to be established, i.e. the data was “normalized”, cleaned, and audited for every different system. That got too hard if there were too many systems. Establishing Master Data repositories was tried. That didn’t always work either. Most companies just lived with the issues - spending many millions - but ended up extremely disappointed at the results. But what else was there to do?

The truth of the matter is that, today, data transformation makes up about 80% of the effort in a large analytics project. The actual analytics modeling makes up about another 10% and the visualization makes up the rest. Imagine what would happen if you could eliminate all that complexity and costs.

One of the biggest advantages NoSQL has is that it supports flexible data and program models. Java Script Object Notation (JSON) - the primary language of NoSQL - is a format that allows you to easily represent diverse and complex data sets through hierarchies and arrays all within a single document.

Collections of JSON documents do not have to be uniform. Each JSON document can be different and this poses no problem for the data store. Events can be stored even when they come from different sources or different versions of systems and they do not need to be “fixed” or “normalized”. There was nothing wrong with them to begin with. They were just different.

The whole data management cycle is reversed – you can use data as soon as it is created – you do not need to first normalize it, throw away large pieces from it (because it has a different structure or fields), clean it etc. Reversing the pattern and allowing you to use the data (and all the data) without doing the “hard work” of cleansing/normalizing is key to being able to derive value from your data assets quickly and efficiently.

Gateway was built from the ground up to leverage flexible data. And produce flexible enterprise analytics - including predictive – that drive flexible Next-Best Action from those predictions. All while leveraging the investments you've already made.

JavaScript Object Notation:

JSON is everywhere and a key component of NoSQL databases. JSON has become the effective standard for representing data in all domains – Internet of Things data, Web data, mobile data – everything. Machine logs and security events are no exception – JSON is everywhere and being able to store data natively as JSON presents a huge advantage over transforming the data back and forth.

All the analytics processing occurs within a native JSON environment - making any platform extremely easy to work with an unlimited number of data sources once they're converted.

NoSQL – Query Language:

The NoSQL “query language” is more suited to business analytics than anything else. The NoSQL query language is a data flow and aggregation pipeline language. This makes it very easy to do complex enterprise analytics of large data sets and is far more powerful in combining querying with ETL and with analytics. For example, a complex query into a standard relational database might take hours vs. minutes for a NoSQL, columnar-based database.

Gateway designs around the capabilities of using columnar JSON within its NoSQL data warehouse. Columnar JSON is by far the quickest way to do complex analytic queries at scale. Therefore, performance and efficiencies are exponentially higher than both Massive Parallel Processing (MPP) and Hadoop-based solutions. No other architecture available today can provide this level of price/performance efficiency and ROI.

Filtering Big Data Lakes:

NoSQL warehousing can “drain” the big data lakes. While the focus of Big Data has often started with creating “big data lakes”, many organizations have learned that landing data can be relatively easy – but sometimes that data is worthless.

Usually, some version of Master program was developed; then all the stuff that didn't end up in the Master got discarded. Needless to say, the now-partial data extracts were not refreshed real-time which meant you ended up with subsets of stale data that very often did not match what was seen in reports from the original source databases.

Hence, no one on the business side really trusted the data from the lakes for mission-critical decisions essentially rendering the entire effort worthless.

At Gateway, we believe that in the great majority of cases, "big data lakes" are a compromise, at best. They evolved to get as much data as possible from the many disparate systems within the enterprise and since they couldn't get the data synchronized due to the data integrity, normalization, and cleansing issue inherent to having too many systems, so dump them all into one place so they could then try to make sense of them.

A much better solution would be if you could federate the data - meaning to allow each disparate database to operate independently at real time then compile real-time snapshots which could then be looked at over time for trend analysis, pattern identification, and advanced analytics in a much timelier manner.

In summary,

NoSQL data warehouse techniques are set to be the 3rd wave of analytics innovation. Their flexibility, simplicity, efficiency, and ability to scale raises the performance bar to new heights while drastically lowering both ease of entry and total costs of ownership.

With 5G and IoT gaining more and more traction seemingly every day, the low latency and infinite scalability of new cloud-native NoSQL solutions provider such as Gateway will undoubtedly make their mark on the Big Data Analytics landscape.

Arvind Nanda的更多文章

Microservices & Evolution

2018年9月18日

Microservices & Evolution

Pre-SOA(Monolithic) | SOA | Microservices Image source: RoboMQ What is SOA? A service-oriented architecture (SOA) is a…

2 条评论

NoSQL Will Rule Analytics

Arvind Nanda

Delivery Head at AutoFacets

In summary,

NoSQL data warehouse techniques are set to be the 3rd wave of analytics innovation. Their flexibility, simplicity, efficiency, and ability to scale raises the performance bar to new heights while drastically lowering both ease of entry and total costs of ownership.

Arvind Nanda的更多文章

社区洞察

其他会员也浏览了

Databricks vs. AWS Lakehouse

Polyglot Persistence: Choosing the Right Database for the Right Task

The Hidden Distinction in Interoperability and Knowledge Representation

AI-Driven Databases: Self-Optimizing for Performance

Multiple Spark Writers with Apache Hudi

Simplifying Data Work with Amazon EMR and PySpark for Data Processing and Analysis

Navigating Big Data with Kafka: A Beginner's Guide

Simplifying Digital Transformation RavenDB: A Database That Just Works

Revolutionizing Data: Next-Gen Databases Transforming Web Development and AI

Azure Synapse Analytics: An Integrated Analytical Platform for Data-Driven Operations

In summary,

NoSQL data warehouse techniques are set to be the 3rd wave of analytics innovation. Their flexibility, simplicity, efficiency, and ability to scale raises the performance bar to new heights while drastically lowering both ease of entry and total costs of ownership.

Arvind Nanda的更多文章

Microservices & Evolution

社区洞察

其他会员也浏览了

Databricks vs. AWS Lakehouse

Polyglot Persistence: Choosing the Right Database for the Right Task

The Hidden Distinction in Interoperability and Knowledge Representation

AI-Driven Databases: Self-Optimizing for Performance

Multiple Spark Writers with Apache Hudi

Simplifying Data Work with Amazon EMR and PySpark for Data Processing and Analysis

Navigating Big Data with Kafka: A Beginner's Guide

Simplifying Digital Transformation RavenDB: A Database That Just Works

Revolutionizing Data: Next-Gen Databases Transforming Web Development and AI

Azure Synapse Analytics: An Integrated Analytical Platform for Data-Driven Operations