登录查看更多内容

Serialization frameworks, simplified.

Taher Borsadwala

Blockchain & Digital Assets Platform Products at BNY | FinTech Solutions

发布日期: 2021年6月24日

+ 关注

Serialization Frameworks?!?

Serialization frameworks are translators!?They enable use of objects between and across languages.?

Basic task being conversion.?They convert (or as the name suggests, serialize) an object into raw bytes & then de-serialize the raw bytes back into an object.?More technically, into an array of bytes.?Raw byte is the common denominator.?It allows SF to convert an object of any language into byte array and then convert that byte array into an object of any language.?

Structure is important and so SFs provide a Schema Definition Language that functions as a DDL (data definition language) for defining a logical data model comprising of entities (objects) and attributes.?Additionally, they provide tagging the objects with specific versions to support backward and forward compatibilities without breaking anything.

Tried & tested & noteworthy serialization frameworks are Apache Avro, Apache Thrift & Protocol Buffers.?Though Java provides built-in serialization, it suffers from poor performance.

Why are Serialization Frameworks needed?

Over the years, JSON, XML and such other formats have allowed developers to write raw data in schema-less formats.?In today’s Agile centric world, it works – it allows quick and easy deliverables.?But then this ease in the short term is what becomes a big problem in the long run.?

Data models are needed to ensure that everyone talks the same language.?Ironically, we run into issues even when we have data models – a common lingo.?So just imagine the wild-wild-west state of things sans an agreed upon data structure or model.

Such data challenges lead to run time errors that have developers and operations or DevOps scratching their heads trying to figure out the source of the error.?Unless an error is replicated, solving it is almost impossible.?And run time errors owing to bad data especially in the absence of a data model, well, those are the ones nightmares are made of.

Advantage of an agreed upon and enforceable schema is not complete elimination of errors, but hey, it gives you sufficient details in terms of a stack trace or log or through some other mechanism, of the error being faced.?Another upside is that since it’s a “managed” error, there is no risk of polluting the data set.

And that’s another win that SFs provides.?They not only allow constructing enforceable schemas easily, they also generate code for different languages for performing CRUD operations, along with validations!

Having painted a positive picture so far, time to highlight a limitation of SFs, that is, they are still unable to create an extremely thorough schema.

Well, things change, data models evolve…

One tragically funny ask from not-so-experienced programmers that I have come across is “I will start coding once the requirements are frozen.”.?

Maybe such programmers are either na?ve or super-smart.?Think about it, requirements are NEVER frozen and so the programmers will NEVER have to code, will NEVER have to work.?RESPECT, eh!

Since data models ie. schemas evolve, a basic requirement is to ensure backward compatibility.

Let’s explore the changes that usually come in.?Attributes could be renamed.?Attributes could be added.?Attributes could be removed.?Same would apply to Entities.

领英推荐

Official Java Client for OpenAI & Spring’s Model…

Artur Skowroński 3 个月前

The Evolution of Logs: From Chaos to Structure with…

Yoseph Reuveni 3 个月前

LangChain4j LLM framework with Oracle Database 23ai…

Madhusudhan Rao 6 个月前

Now, SFs support renaming as they are not dependent on the attribute name, rather they have necessary attribute identifiers (IDs) generated.?And since such IDs are used during serialization and deserialization and not the attribute names, renaming becomes possible.

Attributes/Fields can be removed too.?And that works – only caution being the ID of the deleted field is off-limits.?It should never be used once the attribute has been removed else it will lead to invalid and/or incorrect data.

Additions are fine as new IDs will be generated.?One caveat being that the additions only be optional and not enforce mandatory presence of values.?Basic logic being existing records will not have data for a newly introduced mandatory field and that will lead to bad data.

If you have worked on databases and data modeling before then above builds on similar principles.?SFs allow evolution just like databases do!

Are Serialization Frameworks really that perfect?

Well, yes but no.

Serialization frameworks are able to check for mandatory and optional requirements along with ensuring that the value matches the data type.?Complex validations through business rules or otherwise are not possible.?But then it is not just a limitation of SFs, it is in fact similar to how relational database schemas exist.?Relational databases do not support nested objects organically while SFs do and hence the rules are validations become much more important.

An ideal serialization framework tool would provide pluggable functions in the schema definition.?These functions would encapsulate all rules and validations that are required for strong data quality.?f(data) should ideally return true or false.?If true then persist data else error out.

Such an ideal tool, hopefully one that is language neutral is a unicorn but there are certain ground rules that help leverage existing tools themselves.

On the lines of Data Access Objects, create an additional validation wrapper that handle confirming the data value against the broader set of rules and validations.?Such a layer would need to be replicated across different languages – all languages that one expects to serialize/de-serialize from/to.

Akin to batch job design, have the data flagged into valid or invalid during the run itself.?By doing so, we end up with 2 legitimate data sets.?Valid data set is of course clean.?Invalid data set needs either manual intervention of correction through automation over time.

Approach choice is based on the requirement but if there are multiple languages participating in the object “talk” then coding above discussed layers and checks become mandatory.?Or maybe the SFs come with their own application-specific language.?Again, it’s a matter of choice – getting locked into an SF specific custom language or coding to support the necessary objects in multiple languages.

There will always be some or the other limitation that is found in a serialization framework.?It boils down to the requirement in the end.?No SF will provide 100% requirements coverage but then at least understanding what’s needed beforehand will help in finalizing the best-fit serialization framework tool for the ask.?This stack-overflow topic provides some good criteria for the same: https://www.atltranslate.com/articles/translation-into-multiple-languages

Hope this helps!

Please share your experiences/viewpoints.?Thank you.

要查看或添加评论，请登录

Taher Borsadwala的更多文章

Transfer Crypto from Exchange-based Wallet to Self-Owned Wallet

2022年5月14日

Transfer Crypto from Exchange-based Wallet to Self-Owned Wallet

Context: In recent light of Coinbase announcing that customers may lose their crypto in case the company goes bankrupt…
Max 21 million Bitcoins by 2140 Explained

2022年3月26日

Max 21 million Bitcoins by 2140 Explained

21 million bitcoins only – that’s the max limit. And that too by 2140.
Are Blockchains & Crypto Assets safe?

2022年2月12日

Are Blockchains & Crypto Assets safe?

Obvious & valid question coz we hear about cryptos getting stolen almost every day and that too in huge chunks! Simple…
Multi-Party Computation, simplified.

2022年2月6日

Multi-Party Computation, simplified.

Why is MPC (multi-party computation) needed? In CryptoLand, the KEY is KEY – meaning you lose your private key or your…

5 条评论
Magic behind Bitcoin Transactions Explained

2022年1月8日

Magic behind Bitcoin Transactions Explained

Bitcoin ecosystem is all about transactions – creation, propagation, validation, confirmation & more. You might have…
Two-phase commit, simplified.

2021年6月19日

Two-phase commit, simplified.

Distributed computing / Distributed Data Systems face a bunch of challenges, consensus being a crucial one. Getting…
Cloud for Scalability!

2021年6月13日

Cloud for Scalability!

Designing and building systems that work sufficiently well in the present is ironically, insufficient. Reliability of…
NFTs for Dummies

2021年6月6日

NFTs for Dummies

…by a dummy. There now, this confession should preserve your sense of self-respect enough to read this blog.
One for all n all for one! Idempotency!

2021年5月30日

One for all n all for one! Idempotency!

Yes, I have tweaked the motto that our heroes from The Three Musketeers live by. Let me explain why through some…
LIVE. DIE. REPEAT.

2021年5月23日

LIVE. DIE. REPEAT.

Title's the motto of one of the decent time-loop themed sci-fi films: Edge of Tomorrow! It has Tom Cruise’s character…

See all articles

Serialization frameworks, simplified.

Taher Borsadwala

Blockchain & Digital Assets Platform Products at BNY | FinTech Solutions

Serialization Frameworks?!?

Serialization frameworks are translators!?They enable use of objects between and across languages.?

Why are Serialization Frameworks needed?

Well, things change, data models evolve…

领英推荐

Are Serialization Frameworks really that perfect?

Taher Borsadwala的更多文章

社区洞察

其他会员也浏览了

How FastAPI and GraphQL Boost API Performance and Flexibility?

Error handling in data pipelines

?? FAUN Weekly #439: Why Teaching AI to Forget Is Crucial, Core Python Developer Suspended, and Fine-Tuning Now Available for GPT-4o

LLMs Are Stateless API Calls: Comparing LangChain and AWS Step Functions?+?Bedrock (Enhanced with AWS CDK)

The Power of Lazy Loading in Object-Relational Mapping (ORM)

Reliability by design: Implementing Test Driven Development Strategies in Python Data Engineering

FastAPI: Redefining Modern API Development ??

Unveiling the Power of ORM: Demystifying Its Applications, Benefits, and Limitations

Global Exception Handling in Spring Boot with @ControllerAdvice and Multilingual Support

Serialization Frameworks?!?

Serialization frameworks are translators!?They enable use of objects between and across languages.?

Why are Serialization Frameworks needed?

Well, things change, data models evolve…

领英推荐

Are Serialization Frameworks really that perfect?

Taher Borsadwala的更多文章

Transfer Crypto from Exchange-based Wallet to Self-Owned Wallet

Max 21 million Bitcoins by 2140 Explained

Are Blockchains & Crypto Assets safe?

Multi-Party Computation, simplified.

Magic behind Bitcoin Transactions Explained

Two-phase commit, simplified.

Cloud for Scalability!

NFTs for Dummies

One for all n all for one! Idempotency!

LIVE. DIE. REPEAT.

社区洞察

其他会员也浏览了

How FastAPI and GraphQL Boost API Performance and Flexibility?

Error handling in data pipelines

?? FAUN Weekly #439: Why Teaching AI to Forget Is Crucial, Core Python Developer Suspended, and Fine-Tuning Now Available for GPT-4o

LLMs Are Stateless API Calls: Comparing LangChain and AWS Step Functions?+?Bedrock (Enhanced with AWS CDK)

The Power of Lazy Loading in Object-Relational Mapping (ORM)

Reliability by design: Implementing Test Driven Development Strategies in Python Data Engineering

FastAPI: Redefining Modern API Development ??

Unveiling the Power of ORM: Demystifying Its Applications, Benefits, and Limitations

Global Exception Handling in Spring Boot with @ControllerAdvice and Multilingual Support