登录查看更多内容

HarperDB: An underdog SQL / NoSQL database

George Anadiotis

Analyst, Consultant, Engineer, Founder, Researcher, Writer

发布日期: 2018年2月12日

HarperDB flies in the face of conventional wisdom in a number of ways. But does the world need another database?

Start coding a new database in your garage with a buddy of yours. Use JavaScript. Name it after your dog. Patent your own data model. Do not go open source. Take over the world.

That is HarperDB's recipe for success. It seems so unlikely, it makes you wonder whether it's genius or just crazy.

THE EXPLODED DATA MODEL

Stephen Goldberg and Kyle Bernhardy do not seem like crazy people. They have long standing experience in enterprise consulting, and this is precisely what got them started on HarperDB.

Goldberg and Bernhardy liked the scale and ease of use of NoSQL, but still wanted ANSI SQL for actionable analytics. They wanted the ability to perform multi table joins, and multi conditions statements.

Them, and pretty much everyone else in the database world. The convergence of SQL and NoSQL solutions is something that has been going on for a while. A typical way to deal with this requirement is multi-model databases. But Goldberg and Bernhardy decided to take a different approach.

They felt multi-model was inherently flawed as a design pattern, were frustrated by the performance of data lakes and map reduce solutions, and wanted something that would be ACID compliant.

They thought a single model was needed to accommodate all of the above, so they went ahead and created what they call the exploded data model, which is also the basis of their patent.

The exploded data model is a patent the creators of HarperDB came up with to deal with the need to accommodate both SQL and NoSQL. Image: HarperDB

In the exploded data model, each attribute from a JSON object, or column from a SQL insert/update statement becomes an index upon write. These attributes and their values are stored discreetly on disk.

Goldberg and Bernhardy say this avoids the need to configure foreign keys and indexes, and allows indexing every attribute/column without increasing disk footprint as they do not store the entire record whole, or store separate index tables.

Upon search parallelization is used to coalesce the data back into an object based on which columns are requested. This, Goldberg and Bernhardy note, has the added benefit of allowing joins to be as performant as a single table search:

"Our data model allows for both read and write concurrently at high throughput. Each attribute transaction is discrete, and we don't experience row locking or need in-memory transformation, which often plague database solutions and cause them to fail under HTAP scenarios."

NO SCHEMA, NO MAINTENANCE

It sounds less crazy now, although at first glance their coalescing data approach does not seem completely different from the multimodel approach. To evaluate this would mean to either have access to their implementation and patent, or to benchmark against competing solutions, and these are options we do not have.

What still sounds a little crazy though is to take on established solutions, in a market as crowded as the database market. Goldberg and Bernhardy say they are not trying to compete against entrenched solutions, but rather work alongside them and augment them.

That is part of the reason why they are launching today focusing on IoT, as they note there are a lot of greenfield projects, which need new architectural patterns to see success and to scale.

They also target working alongside traditional SQL data warehouses as a sidecar providing SQL capability in real-time for unstructured data via their JDBC driver, or making column/row data from SQL databases into applications that were designed to interact with JSON.

HarperDB specifically targets IoT use cases, due to its small footprint and the fact that IoT is a relatively new field. Image: HarperDB

HarperDB advertises as schema-less and configuration-free. Goldberg and Bernhardy clarify it is more accurate to say that HarperDB has a dynamic schema. And no configuration refers to the fact that no configuration for columns, foreign keys, data types, or indexes is needed.

HarperDB has the concept of schemas, tables, and attributes. Schemas and tables only provide name spaces for finding attributes, and creating logical collections. Attributes are reflexively created on insert/update and do not have data types, but ODBC and JDBC drivers sample data to suggest data type in BI tools.

Read the full article on ZDNet Big on Data

要查看或添加评论，请登录

George Anadiotis的更多文章

Democratizing data with Graph RAG: What it is, What it can do, How to evaluate it

2024年7月17日

Democratizing data with Graph RAG: What it is, What it can do, How to evaluate it

Democratizing access to data and insights is probably the biggest reason behind the meteoric rise of Generative AI and…

9 条评论
AI politics: From pausing to regulating AI, it’s all about winning hearts and minds

2023年4月27日

AI politics: From pausing to regulating AI, it’s all about winning hearts and minds

“The Letter” was just the beginning. Welcome to the AI politics show.

6 条评论
Orchestrate All The Things: Owning Tech, Data, Media, AI, Writing, and Content

2023年4月20日

Orchestrate All The Things: Owning Tech, Data, Media, AI, Writing, and Content

On AI-generated content, writing, new, old, and broken media, platforms, models, audiences, and body parts. Why do so…
Make software great again: can open source be ethical and fair?

2020年2月27日

Make software great again: can open source be ethical and fair?

Is there a way to go beyond open source, and have ethical, fair software in a cloud-first world? This is what some…
In Between Years. The Year of the Graph Newsletter: January 2019

2019年1月14日

In Between Years. The Year of the Graph Newsletter: January 2019

In between years, or zwischen den Jahren, is a German expression for the period between Christmas and New Year. This is…
The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity

2019年1月11日

The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity

Is technology the answer to life, the universe and everything? A brief account of human history. Technology and…
Zen and the art of data structures: From self-tuning to self-designing data systems

2018年9月20日

Zen and the art of data structures: From self-tuning to self-designing data systems

Designing data systems is something few people understand, and it's very hard and costly. But that, too, could be…

1 条评论
MemSQL 6.5: NewSQL with autonomous workload optimization, improved data ingestion and query execution speed

2018年8月31日

MemSQL 6.5: NewSQL with autonomous workload optimization, improved data ingestion and query execution speed

MemSQL wants to be the world's best database. Leading that race is a tall order, but the new version seems to improve…
The best programming language for data science and machine learning

2018年7月28日

The best programming language for data science and machine learning

Hint: There is no easy answer, and no consensus either. Arguing about which programming language is the best one is a…

16 条评论
Data-driven software development in the cloud: Trends, opportunities, and threats

2018年7月19日

Data-driven software development in the cloud: Trends, opportunities, and threats

Software development has been fundamentally changing. It's following the data and going to the cloud.

See all articles

HarperDB: An underdog SQL / NoSQL database

George Anadiotis

Analyst, Consultant, Engineer, Founder, Researcher, Writer

THE EXPLODED DATA MODEL

NO SCHEMA, NO MAINTENANCE

George Anadiotis的更多文章

社区洞察

其他会员也浏览了

Where is the database schema? #SQL #NoSQL

From SQL to NoSQL: Choosing the Right Database for Your Full-Stack Project

Hasura GraphQL Remote JOINs on Distributed SQL Running on AKS & GKE

SQL vs NoSQL: Picking the Right Side in the Database Showdown

Choosing between SQL and NoSQL

Friends don't let friends use MongoDB!

Deploying a Real-Time Location App with Hasura GraphQL Engine and Distributed SQL

SQL vs NoSQL Databases - Part 1

SQL vs. No SQL

Differences between SQL and NoSQL

THE EXPLODED DATA MODEL

NO SCHEMA, NO MAINTENANCE

George Anadiotis的更多文章

Democratizing data with Graph RAG: What it is, What it can do, How to evaluate it

AI politics: From pausing to regulating AI, it’s all about winning hearts and minds

Orchestrate All The Things: Owning Tech, Data, Media, AI, Writing, and Content

Make software great again: can open source be ethical and fair?

In Between Years. The Year of the Graph Newsletter: January 2019

The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity

Zen and the art of data structures: From self-tuning to self-designing data systems

MemSQL 6.5: NewSQL with autonomous workload optimization, improved data ingestion and query execution speed

The best programming language for data science and machine learning

Data-driven software development in the cloud: Trends, opportunities, and threats

社区洞察

其他会员也浏览了

Where is the database schema? #SQL #NoSQL

From SQL to NoSQL: Choosing the Right Database for Your Full-Stack Project

Hasura GraphQL Remote JOINs on Distributed SQL Running on AKS & GKE

SQL vs NoSQL: Picking the Right Side in the Database Showdown

Choosing between SQL and NoSQL

Friends don't let friends use MongoDB!

Deploying a Real-Time Location App with Hasura GraphQL Engine and Distributed SQL

SQL vs NoSQL Databases - Part 1

SQL vs. No SQL

Differences between SQL and NoSQL