The Importance of Lists, Schema Demo
In this post we will offer several examples of Ancelus linked lists, some history on how and why the table-based model prevailed, and the advantages of the Ancelus approach.
A video demonstration on how a list-based schema is developed in Ancelus is available at www.ancelus.com/tech briefs.
The natural state of all information "in the wild" is list-based. The two-dimensional table structure is a rare exception. But in the early days of database evolution (starting with Ted Codd's 1969 paper) this got to be the accepted model.
Two factors converged to produce this adoption. First, computers physically store information in a 2D format, and secondly the white pages phone book view of "name-address-phone number" was used as a definition of what a database should do. While the white pages structure worked early on, we now discover lists are more basic. (I've just realized that a whole generation of developers have never seen a white pages phone book).
Nested lists are the most common logical structures, and "name-addresslist-phonelist" is a classic example of nested lists. We have multiple addresses (home-office-vacation-etc) and multiple phones (home-office-homefax-officefax-cell-car). Even the name isn't the universal key we thought it was, when we discover that teen children need their own phonelist at our address.
One of the most important features of the Ancelus database is the native list handling architecture. While the idea of double-linked-lists isn't new (found in most textbooks), the Ancelus implementation is unique. Some elements are disclosed in the patents. But important parts have been retained as trade secrets.
A double-linked-list provides an opportunity for bi-directional traverse of the data structures. In our universal example of Part Number/Serial Number, every PN has a pointer to the first SN in the list of many SNs. This SN includes a pointer to the next SN in the list. And each SN has a reverse pointer to its one and only one PN. So the SN "table" appears to have an embedded PN as a foreign key of 40 bytes, but actually has a 4 byte pointer into the PN "table." When combined with the storage algorithm, we now have a logical model that is independent of the physical storage model. We can directly implement lists even when they are nested many layers deep.
This approach eliminates the overhead of conversions from lists to tables and back. It also eliminates the duplication of PN data. The entity-relation-diagram (ERD) used to define the logical structures can now be directly implemented in the Ancelus schema. This simplifies the more complex examples such as recursive lists (lists that point back on themselves).
Recursive lists can be visualized using the family tree example. Imagine a field named "Person" that contains the unique ID of every person who has ever lived (100 billion by some estimates). A family tree is now a list with a pointer from me to two parents, then four grandparents, etc. back to the beginning of the human race.
One classic thought problem is to build a database of all my sixth cousins. While there are PhD theses written on how to do this in tables, with Ancelus it's a simple matter of tracing back to the fifth great grandparents, then down to all their descendants. In business processes this is called "blowback traceability" and will be the subject of a later post.
Another Ancelus feature solves a common problem for financial institutions. The ability to make live modifications to the database, including even the schema definition, can address the multiple account problem. In the EU the recent requirement to report all connected accounts challenged the existing record systems. A single person might have a couple of checking and savings accounts, multiple mortgages on multiple properties, and several credit cards, all established in different departments within the bank. Each of these accounts could have been created at different times, with different addresses and variations on the name,, and all with varying levels of updates. Connecting these has been an expensive proposition, even though a common ID number like social security number makes it theoretically possible.
Since Ancelus can introduce new relationships into the schema while the database is in operation, it dramatically reduces the challenge. As the common threads are identified, simply add them to the schema. Complex reporting now achieves a new level of speed and automation.
This may seem like a minor point, but it makes it possible to deploy a new form of business process. Much of the industry approached this as a project, spending 10s of millions to scrub the data. But how long before it needs to be done again? The right way to do this is to create immediate correction methods so it can become a part of every day operations to update the schema of the databases. With Ancelus at the hub of the integration it is now practical to achieve this continuous improvement business model without the need for major modification of the existing systems. See our post on legacy integration with Ancelus.
Let us know if you have special examples you'd like to see discussed in the Ancelus forum.
Craig Mullins, President & Principal Consultant at Mullins Consulting, Inc. IBM Gold Consultant and IBM Champion for Data and AI
4 年Lists are a commonly used approach for managing data and a database that uses that approach is an interesting idea