Serialization frameworks, simplified.
Taher Borsadwala
Blockchain & Digital Assets Platform Products at BNY | FinTech Solutions
Serialization Frameworks?!?
Serialization frameworks are translators!?They enable use of objects between and across languages.?
Basic task being conversion.?They convert (or as the name suggests, serialize) an object into raw bytes & then de-serialize the raw bytes back into an object.?More technically, into an array of bytes.?Raw byte is the common denominator.?It allows SF to convert an object of any language into byte array and then convert that byte array into an object of any language.?
Structure is important and so SFs provide a Schema Definition Language that functions as a DDL (data definition language) for defining a logical data model comprising of entities (objects) and attributes.?Additionally, they provide tagging the objects with specific versions to support backward and forward compatibilities without breaking anything.
Tried & tested & noteworthy serialization frameworks are Apache Avro, Apache Thrift & Protocol Buffers.?Though Java provides built-in serialization, it suffers from poor performance.
Why are Serialization Frameworks needed?
Over the years, JSON, XML and such other formats have allowed developers to write raw data in schema-less formats.?In today’s Agile centric world, it works – it allows quick and easy deliverables.?But then this ease in the short term is what becomes a big problem in the long run.?
Data models are needed to ensure that everyone talks the same language.?Ironically, we run into issues even when we have data models – a common lingo.?So just imagine the wild-wild-west state of things sans an agreed upon data structure or model.
Such data challenges lead to run time errors that have developers and operations or DevOps scratching their heads trying to figure out the source of the error.?Unless an error is replicated, solving it is almost impossible.?And run time errors owing to bad data especially in the absence of a data model, well, those are the ones nightmares are made of.
Advantage of an agreed upon and enforceable schema is not complete elimination of errors, but hey, it gives you sufficient details in terms of a stack trace or log or through some other mechanism, of the error being faced.?Another upside is that since it’s a “managed” error, there is no risk of polluting the data set.
And that’s another win that SFs provides.?They not only allow constructing enforceable schemas easily, they also generate code for different languages for performing CRUD operations, along with validations!
Having painted a positive picture so far, time to highlight a limitation of SFs, that is, they are still unable to create an extremely thorough schema.
Well, things change, data models evolve…
One tragically funny ask from not-so-experienced programmers that I have come across is “I will start coding once the requirements are frozen.”.?
Maybe such programmers are either na?ve or super-smart.?Think about it, requirements are NEVER frozen and so the programmers will NEVER have to code, will NEVER have to work.?RESPECT, eh!
Since data models ie. schemas evolve, a basic requirement is to ensure backward compatibility.
Let’s explore the changes that usually come in.?Attributes could be renamed.?Attributes could be added.?Attributes could be removed.?Same would apply to Entities.
领英推荐
Now, SFs support renaming as they are not dependent on the attribute name, rather they have necessary attribute identifiers (IDs) generated.?And since such IDs are used during serialization and deserialization and not the attribute names, renaming becomes possible.
Attributes/Fields can be removed too.?And that works – only caution being the ID of the deleted field is off-limits.?It should never be used once the attribute has been removed else it will lead to invalid and/or incorrect data.
Additions are fine as new IDs will be generated.?One caveat being that the additions only be optional and not enforce mandatory presence of values.?Basic logic being existing records will not have data for a newly introduced mandatory field and that will lead to bad data.
If you have worked on databases and data modeling before then above builds on similar principles.?SFs allow evolution just like databases do!
Are Serialization Frameworks really that perfect?
Well, yes but no.
Serialization frameworks are able to check for mandatory and optional requirements along with ensuring that the value matches the data type.?Complex validations through business rules or otherwise are not possible.?But then it is not just a limitation of SFs, it is in fact similar to how relational database schemas exist.?Relational databases do not support nested objects organically while SFs do and hence the rules are validations become much more important.
An ideal serialization framework tool would provide pluggable functions in the schema definition.?These functions would encapsulate all rules and validations that are required for strong data quality.?f(data) should ideally return true or false.?If true then persist data else error out.
Such an ideal tool, hopefully one that is language neutral is a unicorn but there are certain ground rules that help leverage existing tools themselves.
On the lines of Data Access Objects, create an additional validation wrapper that handle confirming the data value against the broader set of rules and validations.?Such a layer would need to be replicated across different languages – all languages that one expects to serialize/de-serialize from/to.
Akin to batch job design, have the data flagged into valid or invalid during the run itself.?By doing so, we end up with 2 legitimate data sets.?Valid data set is of course clean.?Invalid data set needs either manual intervention of correction through automation over time.
Approach choice is based on the requirement but if there are multiple languages participating in the object “talk” then coding above discussed layers and checks become mandatory.?Or maybe the SFs come with their own application-specific language.?Again, it’s a matter of choice – getting locked into an SF specific custom language or coding to support the necessary objects in multiple languages.
There will always be some or the other limitation that is found in a serialization framework.?It boils down to the requirement in the end.?No SF will provide 100% requirements coverage but then at least understanding what’s needed beforehand will help in finalizing the best-fit serialization framework tool for the ask.?This stack-overflow topic provides some good criteria for the same: https://www.atltranslate.com/articles/translation-into-multiple-languages
Hope this helps!
Please share your experiences/viewpoints.?Thank you.