登录查看更多内容

Deep Dive on TimesTen Scaleout Distribution Clauses

Douglas Hood

发布日期: 2018年7月25日

Overview

TimesTen Scaleout is a distributed database which has a shared nothing architecture suitable for low latency, high throughput OLTP, IoT and analytic workloads. TimesTen Scaleout supports high availability, ACID transactions, SQL & PL/SQL and does not require database de-normalization.

Oracle TimesTen Scaleout supports SQL operations independent of the number of hosts. However, as the database is spread over multiple hosts, network messages are required to complete most SQL operations. A fundamental performance tuning technique is to minimize the number of network messages given common use cases. TimesTen Scaleout uses table distribution clauses to enable these optimizations.

TimesTen Scaleout supports three data distribution methods (Distribute by Hash, Distribute by Reference and Duplicate). The following picture is an example of a master-detail-detail hierarchy using all three distribution methods. This is not the only way to support this master-detail-detail hierarchy.

Figure 1

In figure 1, the 'customers' table rows are evenly spread across all hosts in the database. The 'orders' rows are not guaranteed to have the same number of rows per host as some customers have more orders than others. Given a large number of orders, it is expected that the number of orders per host will average out. The products table is duplicated across all hosts so each host will have exactly the same number of rows per host.

If an application wants to query all orders and products for a given customer, then no network messages are required for the join as all of those rows will be present on the host that contains the customer row.

Distribute by Hash

For large tables that are the 'master' table [eg customers], you want to be able to evenly spread the data across all of the hosts in your database. You also want to be able to add and remove hosts from the database without having to make dramatic changes to the distribution of data for that table. The TimesTen Scaleout distribute by hash mechanism uses a consistent hash algorithm to achieve these goals.

Figure 2.

In figure 2, the distribution key is explicitly set to be the same as the primary key. If you do not specify a distribution clause for a table, TimesTen Scaleout will by default use the PK columns for the hash distribution clause. If there is no PK then TimesTen Scaleout will randomly distribute by hash using a hidden internal column. You can also explicitly distribute by a set of columns which are different than the primary key.

In figure 1, the customer rows are evenly distributed across the four nodes in the database. The consistent hash algorithm used is not a perfect hash algorithm so there will tend to be a small variation of the number of rows per table across. Where there are more than about ten thousand rows in a table with distribute by hash, we expect that there will be no more than 2% variation in the number of rows.

Figure 3

Figure 3 shows that all of the nodes (elements) have about the same number of rows [in replica set 1] for the customers table and that there is no more than 2% variation in the number of rows for the replica set.

Distribute by Reference

When you have tables that you expect to be joined often via PK/FK, you can use the distribute by reference table clause for the detail table. When the distribute by reference syntax is used, you guarantee that the child rows in the detail table will be in the same element as the corresponding parent row. This means that a join of those PK/FK will occur on the same element. Joins on the same node will occur in memory in TimesTen Scaleout, so those joins can occur very quickly. As shown in figure 1, the customers table rows are spread evenly over all nodes, but the orders rows for the customers table PKs are always on the same node.

For distribute by reference to work, the parent table has to have a PK and the child table has to have one or more FKs. When there is more than one FK, you choose the FK that will be the most used for your workload.

Figure 4

In figure 4, the FK_CUSTOMER foreign key was chosen over the FK_ACCT_TYPE and FK_ACCT_STATUS foreign keys. This picture also shows that the table is distributed by a set of columns [ie CUST_ID] which is different than the PK columns for the table [ie ACCOUNT_ID].

Figure 5.

In figure 5, with a million rows in the accounts table, there is a fairly even distribution of rows across all of the nodes with less than 1% variation.

Duplicate Distribution

For tables that are often joined to other tables and are 'read mostly', the duplicate distribution clause can be very effective. The contents of these tables are copied/duplicated across all elements in the database. The advantage of duplicate distribution is that these tables can always enable efficient in-memory joins. The disadvantage of duplicate tables is that inserts, updates and deletes to these tables require two phase commit transactions to all other nodes in the database.

Figure 6.

In figure 6, the account_type table uses the duplicate table distribution clause.

Figure 7.

Figure 7 shows that duplicate tables always have the same number of rows. The account_type table only has six rows, but duplicate tables could have billions of rows as long as they are read mostly.

Explicit table distribution clauses are optional performance optimizations for TimesTen Scaleout. Although indexes and materialized views can also help tune data access, always tune your table distribution clauses first.

Disclaimer: These are my personal thoughts and do not represent Oracle's official viewpoint in any way, shape, or form.

要查看或添加评论，请登录

Douglas Hood的更多文章

Vectors for Oracle AI Vector Search

2024年8月11日

Vectors for Oracle AI Vector Search

Introduction Oracle AI Vector Search offers extensive support for a wide range of vector embedding models…

14 条评论
Overview of TimesTen XE SQL

2022年6月13日

Overview of TimesTen XE SQL

An overview of #sql and #plsql on #TimesTenXE The #fastest #database is #free https://blogs.oracle.
How to create a database on TimesTen XE

2022年6月7日

How to create a database on TimesTen XE

The fastest #sql and #plsql #database is #free Learn how to create a database on TimesTen XE Disclaimer: These are my…
TimesTen XE SQL Profiles - part 2

2022年6月3日

TimesTen XE SQL Profiles - part 2

Manage your database password policy with SQL. Use the fastest SQL database for free.
TimesTen XE SQL = free

2022年6月2日

TimesTen XE SQL = free

Try the #fastest #sql #database for #free Get an overview of SQL in TimesTen XE #cloudnative #kubernetes #aws #azure…
TimesTen now supports Python and Node.js

2020年1月23日

TimesTen now supports Python and Node.js

Oracle TimesTen 18.1.

5 条评论
OOW19: eBay SQL performance

2019年9月13日

OOW19: eBay SQL performance

How does eBay: Serve billions of #SQL requests/day Enable high availability Minimize latency Come to #OOW19 so they can…
OOW19: T-Mobile and Siebel

2019年9月13日

OOW19: T-Mobile and Siebel

How do you: Use SQL to get faster access to your Oracle Siebel customer data Provide high availability and better…
SQL Data Caching at #OOW19

2019年9月11日

SQL Data Caching at #OOW19

Learn how eBay and T-Mobile use SQL caching for: #lowerlatency #scalability #highavailability #SQL #joins…
SQL Data Caching with Joins

2019年9月10日

SQL Data Caching with Joins

SQL ResultSet caching is easy and fairly useless. How do you do SQL Data Caching with: Billions of rows of data…

4 条评论

See all articles

Deep Dive on TimesTen Scaleout Distribution Clauses

Douglas Hood

Overview

Distribute by Hash

Distribute by Reference

Duplicate Distribution

Douglas Hood的更多文章

社区洞察

其他会员也浏览了

Data Partitioning and Sharding - From Scratch

Kafka Schema Registry

The Latest in Distributed SQL - June

Selecting a Data Integration Tool

ConfigMaps in Kubernetes

How Structured Is Your Data?

The Hi/Lo Algorithm

Why the Future is Nexus

How Kafka's Topic Partitioning Makes It Faster than Other Message Queues

Overview

Distribute by Hash

Distribute by Reference

Duplicate Distribution

Douglas Hood的更多文章

Vectors for Oracle AI Vector Search

Overview of TimesTen XE SQL

How to create a database on TimesTen XE

TimesTen XE SQL Profiles - part 2

TimesTen XE SQL = free

TimesTen now supports Python and Node.js

OOW19: eBay SQL performance

OOW19: T-Mobile and Siebel

SQL Data Caching at #OOW19

SQL Data Caching with Joins

社区洞察

其他会员也浏览了

Data Partitioning and Sharding - From Scratch

Kafka Schema Registry

The Latest in Distributed SQL - June

Selecting a Data Integration Tool

ConfigMaps in Kubernetes

How Structured Is Your Data?

The Hi/Lo Algorithm

Why the Future is Nexus

How Kafka's Topic Partitioning Makes It Faster than Other Message Queues