JSON Relational Duality @ Oracle Cloud World 24

JSON Relational Duality @ Oracle Cloud World 24

At this year's Cloud World, there will be many sessions on JSON Relational Duality (see here).

Is this just an attempt from Oracle to sell relational tables as JSON by wrapping rows in curly braces? No, it's not. It's something much bigger and better:?

Oracle came up with a 'new' JSON, one that is made for the enterprise: suitable for applications that can grow and solve new and, at development time, unforeseen use cases including but not limited to reports and analytical queries. Cases, where today's JSON document databases fail or lack performance.

To be clear: The JSON part of JSON relational duality looks, smells, and tastes like the JSON you are used to! You can continue using your favorite JSON drivers (even MongoDB’s) or tools for JSON data modeling like Hackolade.

What's revolutionary new about this JSON happens in the backend, in the database: JSON relational duality breaks the restrictions of the hierarchical data model and, therefore, allows for much better data integrity, consistency, and much easier application code. I admit lots of the typical database marketing words – time for examples:

Since we're just heading to the Oracle Cloud World conference, why don't we build an app to manage a conference?



What are the simplest most use cases: 1) offer a session catalog, 2) allow attendees to build a schedule of sessions, and 3) generate a schedule for the speaker. (A lot more use cases are easy to come up with, like a room occupancy plans, but let's keep it simple for now, you get the idea that applications grow more complex over time).

?

Our app is built using 3 ‘things’: Attendees, Speakers, and Sessions.

Database people call these things 'entities'. Entities form 'relationships': Speakers give sessions, Attendees join a session, etc.


Entity-Relationship Diagram

Entities also have 'attributes': A session has a room and time, and speakers have phone numbers (known to the organizers to call them if they don't show up but hidden from attendees).

?

For an attendee's schedule, we'd need a JSON document listing the selected sessions (with room/time info) and usually the speaker's name (without the phone number). Coming up with such JSON document is not too hard:


Schedule for Jill

But what about the speaker's schedule? Traversing each attendee document to find the occurrences of a speaker is not a viable solution! (We also did not add the speaker's phone number here, so we won't find that one anyway).

We need another way to represent the data: one document per speaker. The same goes for the session schedule: We need a document per session (not shown, you get it).


some attributes for each speaker

So, even in this trivially small application, we realize quickly that it is impossible to come up with one JSON document structure/hierarchy that is suitable for all our use cases (catalog and two schedules). Instead, we need three! They all share the same entities but order them differently and pick different attributes (phone number present or not).


There are two solutions with conventional JSON databases to deal with this

1) Duplicate data: Each document has all the information needed, but values are duplicated across documents. In enterprise applications, duplication is bad. not because it takes more space but because it allows inconsistent data and makes updates harder. For example, if we want to update the room for a session, we need to visit the affected attendee and speaker documents and make the same changes. Now, even a trivial change may fan out to affect hundreds or thousands of documents.

?2) Normalize: As the previous solution was not viable, let's solve it differently by replacing embedded values with IDs: The attendee schedule document no longer has usable information about session titles and speaker names; instead, there are a lot of IDs, and we need to do a roundtrip to the database to read those. We basically 'normalize' the JSON data in this approach


Not a good solution: Normalizing JSON


Now, with this approach, we no longer duplicate data, but we also gave up the simplicity of JSON! Let's remind ourselves that JSON is often chosen over tables because joins are not needed and everything can be found in one document. But, the moment we replaced values with IDs, we brought back the same complexity we tried to solve in the first place! Even worse, as there is no concept of foreign keys or referential integrity we again rely on application code to guarantee data consistency! Also, if the app starts to chase IDs it reads data at different point in time, again an invitation for inconsistencies, hard to reproduce bugs and almost impossible to test.

Conclusion: Neither approach 1) nor 2) works for enterprise applications. We need something that delivers the simplicity of JSON plus the use-case flexibility (extensibility) and guaranteed data consistency.

?

And here it is: JSON Relational Duality!?

How does it work? As we just learned in approach 2) that storing different entities (attendee, speaker, session) separately makes a lot of sense: we need to access them separately, depending on the use case. JSON relational duality does the same: we create separate tables for each entity:

create table attendee(
   aid????? number, ? 
   aname??? varchar(128), ? 
   extras?? JSON (object), ? 
   constraint attendee primary key (aid) 
);        
create table speaker(
  sid    number,   
  name   varchar(128),   
  phoneNumber varchar(12),   
  constraint pk_speaker primary key (sid) 
);        

Please note that you can use a JSON-type column to enable full schema flexibility. I could even just use one JSON column if I do not want to define any attributes upfront.

?So, from a data modeling point of view, you can reuse many of your skills or tools. Just keep in mind that if you want schema flexibility, you’d rely on JSON as a column type.

Now, this is where the real innovation is: you can create as many JSON access views as you want. They're tailor-made for each use case: one for the sessions, one for the speakers, one for the attendees, etc. The following shows two view definitions in GraphQL syntax (you can also use SQL if you prefer that):

?create or replace JSON Duality view attendeeV as 
 attendee @update @insert @delete
{  
   _id   : aid,  
   name  : aname,  
   extras @flex 
};        

or the more complex attendee schedule view:

?create or replace JSON Duality view ScheduleV AS 
attendee
{   
   id : aid   
   name : aname   
   schedule  : schedule  @insert @update @delete   
   {     
      scheduleId : scheduleid     
      sessions @unnest     
      {       
         sessionId : sid       
         name : name       
         location : room       
         speaker @unnest       
         {          
            speakerId : sid          
            speaker   : name       
         }     
      }   
   } 
} ;        

Each view selects the entities (tables) for the use case, then picks the required attributes (columns mapped to JSON fields) and organizes them in the desired hierarchy. The JSON documents returned by each view contain all the values needed for a use-case! The app has everything, it does not have to take an ID and fetch other documents. This is true value-embedding.


A JSON relational duality view and its corresponding document


(If you’re a MongoDB user, then you will like the fact that each of these views is also a MongoDB-compatible JSON collection so you can use Compass or language drivers with the views.)

?The views are fully updateable (even allowing for optimistic concurrency control and automatic change detection). Even better: you can precisely define what parts of the documents are allowed to be updated. For example, we want to allow attendees to change their schedule but not to update the time or locations of a session! These rules are declaratively defined when the view is created and centrally enforced in the database and apply to all operations, regardless of whether it comes from SQL, REST, or document store APIs (like the MongoDB API). Once you commit an update through one view, all other views immediately reflect this change. Inconsistencies are impossible and updates are simple because they only have to change one document. If you want to change all schedules that contain the same session then a single update on the session view/collection does the job!

One important aspect?I omit here is the optimistic concurrency control, how we efficiently update the data and detect and prevent undesired overwriting of data with stale data.

Finally, for bulk operations or analytical queries one can use SQL directly over the relational tables. This is true Duality: your data is both SQL and NoSQL, both Table and JSON.

?I hope this short introduction made you a bit curious about JSON in Oracle Database and you visit one of our many JSON sessions at Cloud World!

I'll be co-hosting a Hands-On-Lab (HOL 2963) on JSON Relational Duality on Tuesday at Oracle Cloud World if you're interested in learning more.

"Your data is both SQL and NoSQL, both Table and JSON". This is a game changer. I am looking forward to seeing how this concept is applied in the future with AI. It should open up new data exploration and application options.

要查看或添加评论,请登录

Beda Hammerschmidt的更多文章

  • Oracle Database API for MongoDB

    Oracle Database API for MongoDB

    Today, we announce the launch of a new cloud service: The Oracle Database API for MongoDB. With the Oracle Database API…

    7 条评论

社区洞察

其他会员也浏览了