Common Data Sync Strategies for Application Integration
Atul Gupta
CEO and Co-founder @ APPSeCONNECT | SaaSBOOMi SGx Winter 2022 | Empowering businesses accelerate growth with automated business processes | Co-Founder @ Inspiria | Making the youth employable & life-ready
Even though data integration is a term commonly prevalent in the industry for a considerable amount of time, yet people tend to interchangeably use it with application integration. Data integration is a technique that is used to synchronize information silos, while application integration is a broader term and involves a lot of different techniques present in middleware. The common myth that data can be easily integrated without prior knowledge or one can easily generalize the integration techniques, is often proved to be wrong over time. Data itself represent different meaningful insights which if not considered properly, might result in poor or malfunctioned system.
In this article, we will cover how data synchronization techniques can help citizen integrators to ensure the business applications can bridge the information properly and the data updated to the applications represent the same meaning when referred again. There are many considerations which one can apply while developing an integration between two or more applications, I will consider their mutual pros and cons so as to make you understand what is suitable for your case.
One-Way sync
While in most cases of application integration, you might opt for an e-way sync strategy only, it is best to understand what approach you should take when you create the integration. Applications are different in terms of architecture or even in terms of their APIs or data formats. Depending on the APIs available, we choose one that is suitable for your business case.
For one way sync we consider 3 scenarios:
- Record “Flag” and validate
- Remember “Last Modified Time”
- Capture Data
1. Record “Flag” and Validate
In this approach, the records are extracted from the source application based on some “Flag” value. Upon completion of successful record sync to the other application, we consider updating the flag on the source application again so as to ensure the synchronization does not re-capture the same data again. A “flag” field can be of many forms, some consider just a bit field representing true/false or some application does provide a status field naturally. We consider updating this filed just after the record is successfully captured. We might want to consider the default value of a particular data object to be as “Not synced” or we might also want to consider to define the initialization value. We would also consider updating the status of the flag when there is any change made to the source data again. Hence if you try to follow this approach, you might need to develop some kind of logic around the application so as to ensure the application works correctly as expected.
Pros and Cons
- In this approach the integration allows the record to be synched without much data dependency. Such that if one part of a record is synched and others in the group fail, the integration can still partially work.
- If the flag field is exposed to the user through the user interface, the user can also trigger sync just by updating the record.
- The integration remains stateless.
- Requires source application to be updated, and hence you cannot execute the same process parallelly.
- If the source application does not support field-level customizations or expose the field from the layer, you might not find this approach an option.
2. Remember “Last Modified Date”
If you don’t have an option to create or alter the field in the application end, you should consider the timestamp to capture data change. In this approach, the delta is captured using the Last update date. The process records the most recent record and stores it in persistent storage such that the next ran filters again based on the time saved. Generally, this approach is best suited when the API provides a filter criterion for record retrieval and storing the time from the retrieved data eliminates any calculation regarding the server time differences.
This approach captures the timestamp (most probably the Last updated time) to ensure the integration works best without major duplicate values. You can even choose the current system time or your server time or even store the current time in GMT, but if there is an option to choose the time from the retrieved record, the data should not be missed out. In this approach, the most critical thing is to handle the errored resync data.
Continue Reading: Common Data Sync Strategies for Application Integration