HL7 FHIR and Apache Kafka - when 1+1=3

HL7 FHIR and Apache Kafka - when 1+1=3

The Healthcare standard FHIR is built on modern technologies, primarily on Restful APIs. This works very well for synchronous communication where a caller invokes a function and waits for the result. Example: The clinical system gets a FHIR call to store the patient admission. The caller sends the patient data, it is stored and the server created data, like the ID of the admission, is returned.

FHIR also specifies asynchronous operations where the client assigns a task - find me all patients with ... - and waits for the server to call the client later with the result.

FHIR messaging

For integrating systems the standard defines messaging capabilities - and that is the part where Apache Kafka would help a lot. While FHIR explicitly states that the transport mechanisms can be any, all is still tailored towards Restful APIs. Systems get registered in a central integration hub and when something does change, all current subscribers are notified. This kind of Data Orchestration pattern is something of the past, when Message Queues were all we had. In such environment the producer is in charge. It sends an update and the interface engine distributes the data to all currently(!) registered subscribers.

Orchestration vs Choreography

From a business point of view the opposite would be needed. Consuming systems are added and removed constantly and they must be in control what to read and when. Clinical systems want to get changed data with a latency of milliseconds, a newly connected system might request all changes created within the last six months. So it is more a choreography, where some systems produce data and other systems consume data but each system only cares about its own needs. There is no central person defining the connections. Instead every system picks and chooses at free will from all the data produced. Such a choreography requires a different approach than queues, a distributed transaction log. Apache Kafka is the industry standard backend enabling that.

The FHIR standard is well suited to work together with Apache Kafka. We would need a Restful endpoint that accepts FHIR messaging data and puts the data into Kafka topics, one topic per entity type. Similar the consumers can register themselves as interested party to messages. But the efficiency of such can be increased by 1) a more condensed payload format - Apache Avro - and 2) by streaming the messages. Here is where I would love to see the FHIR standard to be extended.

Compact Json storage

Apache Avro is essentially nothing else than a binary representation of Json data with schema validation. Exactly like the FHIR standard, which defines a schema and the Json representation. There is the common misconception FHIR's recursive data structures and extensibility concept prevent using Avro for everything, but actually Avro supports both. It is just not well documented. I have created a mechanism to convert any FHIR schema into Avro during a Proof of Concept at SAP. In my opinion it would even make sense to elevate Avro as another official FHIR payload format in parallel to XML and Json, to ensure all systems use the same rules how to turn FHIR structures into Avro structures. Most mappings are obvious but some leave wiggle room, hence better define the rules now.

Streaming vs FHIR bundles

The capability to stream FHIR messages is more important. To visualize the difference, what is more efficient: To download via the browser one line of a CSV file. And then request the next line, and the next. Or to download the entire CSV file as one stream?

With messaging the client would connect to the server to download all changes and the server returns message after message. When there are no more messages the connection is kept open(!) and the instant a new message is available, it is sent. For a consumer this looks like a file download that is rather on the slow side. If the connection drops (or after a fixed amount of time/volume the server tells that this was all the data), the client connects to the server immediately again with the information where to start (recover) from. And that is the other performance gain: The start point is sent once at the beginning and all streamed messages are sent, instead of setting a start point for every single call. Not to mention the lower latency when the client is listening actively for the server to produce data.

The only change is to use individual FHIR resources as return, not a message bundle as it wraps all messages into an array. If FHIR would add the details of that as well to its standard, it does help the interoperability for sure.

Summary

FHIR and Kafka are as if they were meant for each other. They complement each other and provide Healthcare providers with the exact qualities needed today: Low cost, high volume, flexibility, future proof. The capability to add/remove consuming systems without impacting any other, give the consumers the control to decide what to read and with what latency (milliseconds latency, every hour, once a day, once in a life time).

Example: Building a Machine Learning system, download all data as training data, configure the ML parameters and then throw away the model because it did not live up to the expectations is the new normal.

Choreography of data instead of central Orchestration, as well as Distributed Transaction Logs instead of Queues, is the architectural foundation FHIR in combination with Kafka provides.


Gino Canessa

Principal Software Engineer at Microsoft

2 年

It sounds like Topic-Based Subscriptions may be the piece you are looking for in FHIR. The content is part of the R5 ballot (https://build.fhir.org/subscriptions.html) and there is a Backport IG for R4B (https://hl7.org/fhir/uv/subscriptions-backport/). If you have thoughts or feedback, I would be happy to discuss.

要查看或添加评论,请登录

Werner Daehn的更多文章

  • Data Quality in the data world

    Data Quality in the data world

    While working for SAP I have seen many products around data governance and business rules validation. All did address a…

    2 条评论
  • SAP Recognized as a Leader in 2024 Gartner Magic Quadrant for Data Integration Tools

    SAP Recognized as a Leader in 2024 Gartner Magic Quadrant for Data Integration Tools

    Above is the head line from SAP news. Personally, I rather makes me sad because of what could have been and still could.

    10 条评论
  • SAP Data Services: How it achieves such stellar performance

    SAP Data Services: How it achieves such stellar performance

    I am not sure if it is just my impression, but I have the feeling we are going backwards in the Data Integration…

    21 条评论
  • What is the network latency of above architecture?

    What is the network latency of above architecture?

    In the last years, working in SAP development and customers alike, I got the impression that the network latency is…

    22 条评论
  • SAP Hana transaction log reading

    SAP Hana transaction log reading

    Recently there have been some SAP notes about Hana transaction log readers provided by other companies. As I was hired…

    6 条评论
  • How to deal with Central vs Decentral data requirements

    How to deal with Central vs Decentral data requirements

    In the past there was not much of a question how to process data from sites. Each site had their own IT, aggregated…

    2 条评论
  • SAP stock price conundrum

    SAP stock price conundrum

    Recently JP Morgan told they expect a stock price of 205€, Jeffries same amount, UBS expects 191€. I have sold my…

    14 条评论
  • Selecting a Data Integration Tool

    Selecting a Data Integration Tool

    Moving data from one system to another with some transformations is Data Integration. This is fine for ad hoc…

    4 条评论
  • How to use Apache Kafka for Data Integration

    How to use Apache Kafka for Data Integration

    In the last few years I have seen Apache Kafka in many projects and every single one did not take full advantage of its…

    4 条评论
  • The power of Apache Kafka for Data Integration

    The power of Apache Kafka for Data Integration

    In the last few years I had multiple conversations about Apache Kafka and the same misunderstandings keep popping up…

    9 条评论

社区洞察

其他会员也浏览了