Reality Distortion Field by tech blogs. Monolith/Microservices, Monorepo/Polyrepo, CAP Theorem and more.
https://www.mdpi.com/2071-1050/10/10/3472

Reality Distortion Field by tech blogs. Monolith/Microservices, Monorepo/Polyrepo, CAP Theorem and more.

Steve Jobs mastered Reality Distortion Field (RDF) that too without Social Reach. Marketing, Charisma, Bravado, Hyperbole, Appeasement and Persistence all of it went into Distorting the Reality to motivate & align teams.


Reach of Tech Blogs via Social network amplifies the Groupthinking Bias via Social Contagion. Today to create RDF you don't need hard Skills like Marketing, Charisma, Bravado, Hyperbole, Appeasement and Persistence all you need is Brand and Reach and a Global Reality Distortion Field gets created.


There are two purposes where Jobs employed RDF, One Motivate Teams to achieve the impossible, Second to maintain Authority & Brand(Steve Jobs) by employing RDF for 'plagiarism' -

'?Jobs could also use the reality distortion field to appropriate others' ideas as his own, sometimes proposing an idea back to its originator, only a week after dismissing it'. (https://en.wikipedia.org/wiki/Reality_distortion_field)

RDF can have positive and Negative impacts.


Monolith Vs Microservices

The PrimeVideoTech Blog is getting the attention for all the wrong reasons then the Blogs main purpose of Scaling and Reducing the cost - rather reducing the unwanted cost.

There is an hidden RDF in the Cloud Computing World that many organisations suffer, there are goals created to reduce hyped/inflated cost caused by poor design decisions. Which is then touted to be a problem solved rather than inspiring teams to never take poor design decisions that inflates cost unnecessarily.

The Blog refers to a Move from MicroServices architecture to Monolith Application to save the cost. There are multiple misinterpretation of the blog and architecture on the Internet.

1) It's not two different Business domains(Entities) are brought together to create a Monolith, rather Components within same service (VQA Microservice) are put in a Single Container with same Entities (Video Payload, Audio Payload, Quality, metrics, streamId, FrameId).

How small should be your MicroService has always been a discussion point, nothing new here!!!

2) Solution just reduces the cost of S3, and brings the video converter(computationally expensive) and different types of detectors into a single ECS Task or Pod/Process.

3) Now the service is cloned and parameterised to support subset of different detectors. Each cloned/parameterised service would either have to run as different ECS Task or a different process in a combined container.

Different ECS task does not give any benefit as compared to previous architecture of having different compute for each detector type.

Running a specific set of detectors on a combined container will either waste resources as Audio detectors may not need same CPU as Video detectors OR it will add deployment complexity as same type container/service have to run in cluster with different Resource profiles.

4) Solution of saving the S3 cost is actually adding more computation cost. As now each ECS Task needs video converter, instead of initial design where a single converter shared data to multiple detectors.

Clearly, Blog claims Cost saving by removing S3 and AWS Steps, none of which are actual compute cost.

This Cost saving claim is Reality Distortion Field as the entire idea of storing temporary frames to S3 itself is poor architecture design for following reasons.

1) S3 is persistent Storage and thus incurs unnecessary Disk I/O.

2) Those images are stored temporary thus incurring Network I/O.

3) AT 30 FPS with 1000 live streams and 1080P resolution we are looking at 30K Images uploaded with minimum 5GPS bandwidth to upload and multiplied by number of detectors for download.

4) Also, this are live streams thus looking at under 2 to 6 seconds latency, most of the latency target be lost in upload/download itself.

Why not use simple in memory distributed data store ? to store serialised frames.



No alt text provided for this image

Cost of Storage have seen dramatic downward trend as compared to compute cost, thus reducing compute cost should always be priority irrespective of storage cost optimisation.

Attention that this blog is getting is creating yet another Global Reality Distortion Field, by simply the Brand and Reach of Amazon. Multiple blogs and videos are getting created to support Monolith against Microservice. Social Echo Chamber is created for the idea.

Of course the author does not intend to choose one vs another but only for specific Scenario, but 'argumentum ad populum' argument against the popular belief from an Authority(Amazon) is making techies interpret in different ways.

While researching the topic came across another 'argumentum ad populum' from Amazon Prime, which touts UDP for low latency and again which creates and Reality Distortion as UDP/RTP was always there for past 20 years or more for low latency video, that's how your real time video conferencing already works and worked. An RDF is being created here first by Apple(co-incidentally Steve Jobs company) by making HLS a Global protocol for streaming. Amazon is leveraging the distortion to create yet another RDF.


Monorepo vs Polyrepo

Tech Blogs intentionally or unintentionally created another RDF in terms of Monorepo vs Polyrepo discussion. Which based on Google's Brand value and Reach has made it more of Cult to follow Mono Repo. Whereas Google itself has multiple times considered moving away from Mono Repo. Also, Android and Chrome have 100's of repo.

Goolge's choice of Monorepo is legacy and not the State of Art. Which in turn is supported by State of Art tools google created.

  • Google chose the monolithic-source management strategy in 1999 when the existing Google codebase was migrated from CVS to Perforce.
  • ?Early Google engineers maintained that a single repository was strictly better than splitting up the codebase, though at the time they did not anticipate the future scale of the codebase and all the supporting tooling that would be built to make the scaling feasible.
  • At Google, with some investment, the monolithic model of source management can scale successfully to a codebase with more than one billion files, 35 million commits, and thousands of users around the globe.

The Trade-offs that google made in favour of Monorepo are as follows :-

  • Tooling investments for both development and execution;
  • Codebase complexity, including unnecessary dependencies and difficulties with code discovery; and
  • Effort invested in code health.?


Google at a very high level, has developed following tools to support Monorepo which most organisations at smaller scale will fail to support the legacy.

  • Piper :- Piper stores a single large repository and is implemented on top of standard Google infrastructure, originally Bigtable,2 now Spanner.3 Piper is distributed over 10 Google data centers around the world, relying on the Paxos6 algorithm to guarantee consistency across replicas.?

Google was not able to find a commercially viable source repository for same.

  • CiTC :- which consists of a cloud-based storage backend and a Linux-only FUSE13 file system. Developers see their workspaces as directories in the file system, including their changes overlaid on top of the full Piper repository. CitC supports code browsing and normal Unix tools with no need to clone or sync state locally. Developers can browse and edit files anywhere across the Piper repository, and only modified files are stored in their workspace.
  • Pre-submit:- ?infrastructure provides automated testing and analysis of changes before they are added to the codebase. A set of global presubmit analyses are run for all changes, and code owners can create custom analyses that run only on directories within the codebase they specify. A small set of very low-level core libraries uses a mechanism similar to a development branch to enforce additional testing before new versions are exposed to client code.?
  • Tricorder:- Google's home built static code analyzer. Provides data on code quality, test coverage, and test results automatically in the Google code-review tool. These computationally intensive checks are triggered periodically, as well as when a code change is sent for review. Tricorder also provides suggested fixes with one-click code editing for many errors. These systems provide important data to increase the effectiveness of code reviews and keep the Google codebase healthy.
  • Critique:- allows the reviewer to view the evolution of the code and comment on any line of the change. It encourages further revisions and a conversation leading to a final “Looks Good To Me” from the reviewer, indicating the review is complete.


Tech Blogs creates Reality Distortion by advocating benefits for Monorepo without appreciating the need and challenges to support a monorepo and scale of the tooling to be developed.

It's a legacy that Google can afford, by investing in tooling, but it definitely not the only solution to manage multiple projects and definitely not the Stat of the Art.


CP Vs AP Vs CA:-

Yet another Tech Blog created RDF is on distributed systems, most tech blogs will lead you to believe its always either of 3 combination. But that's not true, as until network partition occurs a distributed system can have all of the 3 from CAP. The premise of CAP theorem is on the fact of Network Partitioning.

In fact, Eric Brewer(CAP Theorem Theorist), in his blog for cloud spanner notes:-

"For distributed systems over a “wide area”, it is generally viewed that partitions are inevitable, although not necessarily common [BK14]. Once you believe that partitions are inevitable, any distributed system must be prepared to forfeit either consistency (AP) or availability (CP), which is not a choice anyone wants to make. In fact, the original point of the CAP theorem was to get designers to take this tradeoff seriously.

But there are two important caveats: first, you only need forfeit something during an actual partition, and even then there are many mitigations . Second, the actual theorem is about 100% availability, while the interesting discussion here is about the tradeoffs involved for realistic high availability.

Spanner claims to be consistent and available Despite being a global distributed system, Spanner claims to be consistent and highly available, which implies there are no partitions and thus many are skeptical.1 Does this mean that Spanner is a CA system as defined by CAP? The short answer is “no” technically, but “yes” in effect and its users can and do assume CA"

It’s the network

First, Google runs its own private global network. Spanner is not running over the public Internet — in fact, every Spanner packet flows only over Google-controlled routers and links (excluding any edge links to remote clients). Furthermore, each data center typically has at least three independent fibers connecting it to the private global network, thus ensuring path diversity for every pair of data centers.2 Similarly, there is redundancy of equipment and paths within a datacenter. Thus normally catastrophic events, such as cut fiber lines, do not lead to partitions or to outages.

Thus as oppose to popular belief if you can control network via private network or using a single VPC, as developer you can always assume CAP and not any 2 of them. That's the premise on which world's only Globally, Distributed, Scalable and ACID Compliant DB, Spanner works on. World's only because who else owns fibre network across continents ? But if you are on single VPC, you are GOOGLE can afford all of C & A & P.

Part of the Tech Blog's Reality Distortion is because Software development has become commodity with this big tech giant solving most scale problems. Thus the fundamentals of Systems are never questioned rather the Like Steve Job's Charisma, the brands with their blogs have potential to Distort reality and make you unaware of reality.

When you work on a Scale problem, Distortion field cannot survive. Always evaluate Tech stack or Architecture for Scale at Bits/Bytes and Packet levels to overcome Distortion.

要查看或添加评论,请登录

Dhruvin Desai的更多文章

社区洞察

其他会员也浏览了