登录查看更多内容

MemSQL 6.5: NewSQL with autonomous workload optimization, improved data ingestion and query execution speed

George Anadiotis

Analyst, Consultant, Engineer, Founder, Researcher, Writer

发布日期: 2018年8月31日

MemSQL wants to be the world's best database. Leading that race is a tall order, but the new version seems to improve on an already strong offering.

Is it SQL, NoSQL, or NewSQL?

Even though it's just naming conventions, MemSQL is usually thought of as part of the NewSQL lot. That is, databases grounded on the relational SQL model yet adopting some of the "knobs" the NoSQL lot has introduced for people to turn in order to achieve horizontal scalability.

It's a balancing act between consistency and performance, which, in the end, applies to any distributed database. NewSQL folks argue you should not have to ditch schema, or SQL, to get the scale-out benefits that NoSQL brings. If you like your schema and SQL, and also like your database to have analytics as well as transactional capabilities, then MemSQL is probably in your radar.

Today, MemSQL is announcing version 6.5, introducing autonomous workload optimization, and bringing improved data ingestion, as well as query execution speed. ZDNet had a Q&A with MemSQL CEO Nikita Shamgunov on what the new version does and how.

Autonomous workload optimization

What autonomous workload optimization means in practice is that at least some of the tasks typically attended to by a DBA in MemSQL deployments are now to some extent handled by automation.

MemSQL utilizes a shared-nothing, distributed, in memory processing architecture. Nodes are organized in two layers -- leaf nodes and aggregators -- with the latter splitting and routing workloads and the former executing them.

In MemSQL 6.5, explains Shamgunov, workload management was added to more gracefully handle the unpredictable spikes in queries while still maintaining high performance during normal load. Workload management can be broken into three components that work together to address cases of heavy load: Detection, prediction, and management.

MemSQL wants to take the lessons learned from NoSQL and apply them to SQL. (Image: MemSQL)

Detection refers to identifying when any node is struggling. MemSQL differentiates memory used for table data or temporarily in queries to determine if it is safe to continue forwarding more queries to the target nodes.

Prediction refers to estimating the resource usage of queries and classifying them into groups accordingly. In MemSQL 5.8, management views were introduced to allow users to see resource usage statistics of previously run queries. Workload management can also use these statistics to determine how expensive a query is from a memory consumption perspective.

The last component is Management, which admits queries in three tiers. The cheapest queries such as single-partition selects, inserts, or updates avoid being queued entirely. Queries that use moderate resource amounts are queued on a FIFO (first-in first-out) basis at a rate dependent on the highest load among leaves.

The most expensive queries are queued with maximum concurrency and running is coordinated between all the aggregator nodes. Since there is an associated cost to the coordination, only the most resource-intensive queries fall into this category.

When asked to compare MemSQL's autonomous workload optimization capabilities with other databases such as Oracle, Datastax, and ScyllaDB, Shamgunov noted that MemSQL has similar capabilities to Oracle's Automated Memory Management utility:

"MemSQL offers slightly more granular options such as managing pre-configured query pools while Oracle offers a more broad total memory management utility. Datastax and ScyllaDB offer simplified memory management configurations for allocating memory to entire applications but do not offer lower level configurations based on query profile or query pool."

Pipelines and stored procedures

In a previous discussion with Gary Orenstein, MemSQL's former CMO, one of the things we talked about was MemSQL's positioning in a real-time data ingestion/transformation pipeline. MemSQL has introduced its own Pipelines for data ingestion in version 5.5, and Orenstein mentioned that MemSQL's capabilities are such that some MemSQL clients choose to not use solutions like Spark or Flink, but rather ingest data directly using MemSQL.

MemSQL Pipelines architecture. Now Pipelines is enhanced with the ability to execute stored procedures. (Image: MemSQL)

Shamgunov notes that MemSQL 6.5 introduces pipelines to stored procedures, augmenting the existing MemSQL Pipelines data flow by providing the option to replace the default Pipelines load phase with a stored procedure:

"The default Pipelines load phase only supports simple insertions into a single table, with the data being loaded either directly after extraction or following an optional transform.

Replacing this default loading phase with a stored procedure opens up the possibility for much more complex processing, providing the ability to insert into multiple tables, enrich incoming streams using existing data, and leverage the full power of MemSQL Extensibility.

Leveraging both a transform and a stored procedure in the same Pipeline allows you to combine the third-party library support of a traditional transform alongside the multi-insert and data-enrichment capabilities of a stored procedure."

While data streaming and ingestion to support real-time insights is an essential feature, adding, configuring, and operating the streaming engines required to support this increases operational complexity. This should initiate a nuanced discussion on comparing the capabilities of streaming engines versus Pipelines.

Assuming, however, that the transformations supported by Pipelines are enough to cover the scenarios they are interested in, flattening system architecture even beyond Kappa sounds like a strong selling point for prospective MemSQL users.

Resource optimization improvements for multi-tenant deployments

With multi-cloud and hybrid cloud strategies being a reality for most organizations, MemSQL already has had a strategy in place for this since last year. But while using MemSQL in multi-cloud and hybrid cloud environments is possible, cross-querying is not -- it entails moving data around. That is a rather high price to pay, in more than one ways.

We asked Shamgunov whether anything has changed in the meanwhile, and what the improvements for multi-tenant deployments are about. While the operation of MemSQL in multi-cloud and hybrid cloud environments does not seem to have changed, Shamgunov referred to the need many organizations have to support a workload of database-as-a-service:

"This is a model where multiple organizations have data living side by side to reduce management costs but each organization requires a strong namespace and security boundary around its data. Organizations also often want to leverage the existing ecosystem of tooling available on top of the data layer. This is a challenge to do using legacy databases or NoSQL systems.

With MemSQL 6.5, we further enhanced the multi-tenant capabilities by optimizing memory utilization to deliver dramatically more tenants with fewer hardware resources. With the improvement, internal tests were able to configure nearly thousands of tenants with significantly fewer machine resources than the incumbent single node database solution."

Read the full story on ZDNet Big on Data

要查看或添加评论，请登录

George Anadiotis的更多文章

Democratizing data with Graph RAG: What it is, What it can do, How to evaluate it

2024年7月17日

Democratizing data with Graph RAG: What it is, What it can do, How to evaluate it

Democratizing access to data and insights is probably the biggest reason behind the meteoric rise of Generative AI and…

9 条评论
AI politics: From pausing to regulating AI, it’s all about winning hearts and minds

2023年4月27日

AI politics: From pausing to regulating AI, it’s all about winning hearts and minds

“The Letter” was just the beginning. Welcome to the AI politics show.

6 条评论
Orchestrate All The Things: Owning Tech, Data, Media, AI, Writing, and Content

2023年4月20日

Orchestrate All The Things: Owning Tech, Data, Media, AI, Writing, and Content

On AI-generated content, writing, new, old, and broken media, platforms, models, audiences, and body parts. Why do so…
Make software great again: can open source be ethical and fair?

2020年2月27日

Make software great again: can open source be ethical and fair?

Is there a way to go beyond open source, and have ethical, fair software in a cloud-first world? This is what some…
In Between Years. The Year of the Graph Newsletter: January 2019

2019年1月14日

In Between Years. The Year of the Graph Newsletter: January 2019

In between years, or zwischen den Jahren, is a German expression for the period between Christmas and New Year. This is…
The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity

2019年1月11日

The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity

Is technology the answer to life, the universe and everything? A brief account of human history. Technology and…
Zen and the art of data structures: From self-tuning to self-designing data systems

2018年9月20日

Zen and the art of data structures: From self-tuning to self-designing data systems

Designing data systems is something few people understand, and it's very hard and costly. But that, too, could be…

1 条评论
The best programming language for data science and machine learning

2018年7月28日

The best programming language for data science and machine learning

Hint: There is no easy answer, and no consensus either. Arguing about which programming language is the best one is a…

16 条评论
Data-driven software development in the cloud: Trends, opportunities, and threats

2018年7月19日

Data-driven software development in the cloud: Trends, opportunities, and threats

Software development has been fundamentally changing. It's following the data and going to the cloud.
Moving fast without breaking data: Governance for managing risk in machine learning and beyond

2018年6月29日

Moving fast without breaking data: Governance for managing risk in machine learning and beyond

How do you resolve the tension between the need to build and deploy accurate machine learning models fast, and the need…

See all articles

MemSQL 6.5: NewSQL with autonomous workload optimization, improved data ingestion and query execution speed

George Anadiotis

Analyst, Consultant, Engineer, Founder, Researcher, Writer

Autonomous workload optimization

Pipelines and stored procedures

Resource optimization improvements for multi-tenant deployments

George Anadiotis的更多文章

社区洞察

其他会员也浏览了

Big Data Computation: Revolutionizing the Digital World

Real-time Data Processing with Google Dataflow

SQL Automation for Machine Learning: Building Reusable, Modular, and Scalable Pipelines for AI Professionals

Polyglot Persistence: Choosing the Right Database for the Right Task

DATA Pill #070 - 3 dbt SQL engines, Machine Learning Platform at Walmart

Harnessing Kafka Streams for Real-Time Data Processing: A Case Study

Data Formats and Compression in Data Engineering: Best Practices for CSV, Excel, JSON, Parquet, and Avro

Taming the Slowdown: A Comprehensive Guide to Optimizing Spark Queries

Building a Universal Data Lake with EMR Serverless: Hands-On Labs for Querying with Snowflake, Athena, and Spark – A Guide for Beginners, Leaders

Day 9: Data Storage and Management

Autonomous workload optimization

Pipelines and stored procedures

Resource optimization improvements for multi-tenant deployments

George Anadiotis的更多文章

Democratizing data with Graph RAG: What it is, What it can do, How to evaluate it

AI politics: From pausing to regulating AI, it’s all about winning hearts and minds

Orchestrate All The Things: Owning Tech, Data, Media, AI, Writing, and Content

Make software great again: can open source be ethical and fair?

In Between Years. The Year of the Graph Newsletter: January 2019

The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity

Zen and the art of data structures: From self-tuning to self-designing data systems

The best programming language for data science and machine learning

Data-driven software development in the cloud: Trends, opportunities, and threats

Moving fast without breaking data: Governance for managing risk in machine learning and beyond

社区洞察

其他会员也浏览了

Big Data Computation: Revolutionizing the Digital World

Real-time Data Processing with Google Dataflow

SQL Automation for Machine Learning: Building Reusable, Modular, and Scalable Pipelines for AI Professionals

Polyglot Persistence: Choosing the Right Database for the Right Task

DATA Pill #070 - 3 dbt SQL engines, Machine Learning Platform at Walmart

Harnessing Kafka Streams for Real-Time Data Processing: A Case Study

Data Formats and Compression in Data Engineering: Best Practices for CSV, Excel, JSON, Parquet, and Avro

Taming the Slowdown: A Comprehensive Guide to Optimizing Spark Queries

Building a Universal Data Lake with EMR Serverless: Hands-On Labs for Querying with Snowflake, Athena, and Spark – A Guide for Beginners, Leaders

Day 9: Data Storage and Management