Table Joins: Benchmarks are Fine. What About Doing Real Work?
In the real world we don’t build systems from benchmarks. The practical work in database systems gets done in table joins. Let’s use a simple example.
Sometimes the data we’re looking for doesn’t reside in a single table. To get the view we want requires the combination of two or more tables. For example, the SERIAL NUMBER (SN) table contains data about specific units, but the many details common to the part are found in the PART NUMBER (PN) table. This is done to avoid repeating the PN data in every SN record. To connect the two tables the part number is recorded in the SN table. This is called a foreign key and it tells us where to look in the PN table to find all those details. When we need to recover the complete information about a list of serial numbers, we merge the two in a process called a “Table Join.”
One of the most impressive benchmarks of the Ancelus system is the time required for complex table joins. This focus was not accidental. Our research shows that this is the single biggest time consumer in applications software (based on actual measurements in real systems). So, we went to great lengths to find and cure the root cause of the time delays.
The main observation is that traditional table joins require a large number of “compare” operations to isolate the data that satisfies the spec. The more tables involved in the join, the slower the operation. And the time increases exponentially as the number or size of tables grows.
In a comparative benchmark several years ago, Ancelus reduced the 3-table-join time from 230 seconds to 1.5 milliseconds in a 200 million record database. To show off a little we decided to test the time required to modify the database. We added a field to the primary table in both systems. In the relational system it required 3,940 seconds with the database locked the entire time. For Ancelus there was 700 milliseconds of prep time followed by 8 microseconds where the database was locked and out of service.
Ancelus dodges this problem with the unique combination of links and tags automatically included in the Ancelus schema. This delivers direct access to the target data. This eliminates the need for the compare operations, recovering most of the time lost in traditional systems.
For many database applications these time delays are an acceptable compromise. But in large, complex or many-user systems the time for table joins can run into hours (in one big-data case, days).
The Ancelus table-join benchmarks have generated more comments than any other feature. These current benchmarks are quoted for the Intel Broadwell server configured with 44 cores, running 44 concurrent queries. The 7-table specification involves a schema with billion row tables. All 44 queries are completed in less than 11 seconds.
While the size and complexity of this benchmark test is extreme by most standards, it demonstrates one awkward reality that most DBAs live with: Nobody even tries a 7 table join in traditional systems because “everyone knows” it can’t be done. Until now.
So, if your goal is to improve throughput (number of concurrent users), shrink hardware, or get better latency (response time to each query), you can use Ancelus to address the challenge.
To learn more contact us at www.ancelus.com
Craig Mullins, President & Principal Consultant at Mullins Consulting, Inc. IBM Gold Consultant and IBM Champion for Data and AI
5 年The results for Ancelus are truly impressive. 44 queries using a 7-table join on billion row tables finishing in less than 11 seconds is blazingly fast.