登录查看更多内容

Disciplined system design for modern applications

Ramesh Yerramsetti

发布日期: 2022年12月1日

+ 关注

Many times I am asked what is the best way to design a system. My point of view below:

Executive sponsor: Know who the exec sponsor is and get their blessing for the initiative. The sponsor can be a client exec, your line of business exec, internal exec from a consuming organization, etc. Without this sponsorship everything else will fail
Collect your use cases. At IBM we have a use case liabrary across industries to prevent re-inventing the wheel. Again the use cases must come from users at all strata in the organization. Be exhaustive by collecting all Universe of possibilities.
Separate the wheat from the chaff by winnowing down the use cases to what really aligns with their position and aspirations in the marketplace. Come with a smaller distilled set of use cases and put them in the product backlog
Almost all solutions are distributed these days. Re-lean the CAP theorem and understand what the performance goals are for the selected use cases. Since we will have partitions in any given environment we can have consistency OR availability but not both
List the non-functional requirements:

Performance
Peak load scalability
Rate limiting on APIs
Third party API integration
Security - Zero trust or otherwise, real time monitoring, security operations center
Resilience
Authentication
Logging
Monitoring
Tracing
Elasticity

6. Define major system constraints and boundaries: define the traffic and peak load on your system that your packaged or custom software would be handling. Are these constraints on performance, usability, scalability, etc.

7. Define the POC: Define the high level abstract design, system layers such as UI, Application, Data, Backups, and high level communication protocols between these layers.

8. Identify performance, data bottlenecks and dependencies to arrive at a scalable design components list such as:

Number and type of web servers that must be connected to the internet
Load balancer/Proxy between UI and Application layer for evenly load distribution across application servers. AWS has both a level 4 Network load balancer and level 7 Application load balancer
Horizontal Scaling out based on the defined performance and usability baseline
Defining backup layer for databases supporting high availability.
Distributed databases / Database partitioning and sharding
Internal load balancer between app layer and DB layer
Cache at the UI layer, Cache for common queries, readahead for common queries, Cache for common data, Cache Invalidation based on least used or least recently used (LRU)
DB selection. For e.g. data structure friendly Redis for in memory caching and Mongo DB for persistence
List of third party integration with APIs.

9. Select the Platform given the IT policy, budget constraints and preference for a Cloud

On-prem vs. Cloud vs. Hybrid cloud, vs. in-situ processing of data
Use of IBM Cloud Paks for rapid deployment of sets of functionality (such as MQ, API Connect and APP Connect) to create an integration platform with just one install of the Cloud Pak

10. Design:

Wireframes for UI with IBM Carbon design system, Vue and Figma. [1]
Global Load Balancing (GSLB) - Geography based load balancing allows the client to be directed to the optimal datacenter location.?[2]
Failover load balancing will send all requests to the first host listed until the load balancer determines that particular host is no longer available. It will then direct traffic to the next node in the list in the order specified. [2]
Create an internal load balancer and register the database servers, and app servers with this. Database servers receive requests from this internal load balancer.
IBM Cloud offers classic application and network load balancers. For VPC infrastructure, there are two varieties of load balancers: application load balancers for VPC and network load balancers for VPC. For classic infrastructure, IBM Cloud offers several options including IBM Cloud Load Balancer and Citrix NetScaler appliances. [1]

11. Caching: There are four areas where Caching helps

Performance. The primary requirement for any caching solution is to improve performance, even under high loads. Ideally, it should increase throughout and reduce latency. [3]
Scalability. A system must respond to load changes promptly. In your fictional shoe company, sudden increases in demand might occur when you run sales promotions or at specific times of the year. Scaling should be automatic and occur without downtime.[3]
Availability. Any caching solution must be highly available. This helps ensure that your apps can deliver at peak performance, even if component failures occur.[3]
Support for geographic distribution. It's essential that a caching solution provides the same performance and scaling benefits everywhere in the world. This can be challenging if your data is geographically dispersed.[3]
RDB?or “Redis?Database Backup”?creates point in time snapshots of Redis data
RediSearch. Provides a powerful indexing and querying engine with a full-text search engine.
RedisBloom. Provides support for probabilistic data structures.
RedisTimeSeries. Enables you to ingest and query large quantities of data with very high performance.
Many opensource tools have caching built in. For e.g. solr.search.?LRUCache , solr.search.?FastLRUCache, and solr.search.?LFUCache?.
Use a distributed cache to Manage spikes in traffic, cache and provide commonly accessed data to users, help reduce compute load on your databases, locate content geographically closer to users and provide for output caching.

领英推荐

Domain Driven Design for Microservices

Chandramohan P 2 年前

10 key facts about Event-Driven Architectures

Astrakhan 5 个月前

Design Patterns for Microservices

?? Saral Saxena ?????? 3 年前

An example of cache management is in the Sterling OMS tool "The?Sterling Order Management?reference data caching is implemented by a?local,?simple,?lazy-loading,?asynchronous-refresh?cache manager.

The cache manager is a?lazy-loader?in the sense that it does not read in the cacheable reference tables at start up but would instead only cache records as they are being read. The benefit of the lazy-loading strategy is that data is only cached where they are needed.

The cache manager implements a?simple?cache management policy. Data that is cached remains in the cache until the cache manager is instructed to flush the cache. This could happen because the cache has reached a certain size limit or a reference data record was changed from a standard?Sterling Order Management?API. The cache manager does not implement cache management policies, such as record flushing using a least recently used algorithm, in order to avoid cache management overheads. In our controlled test, this?simple?cache manager provides significant performance benefits with little management overhead.

In keeping with the simple cache strategy, when a reference data record is changed by a?Sterling Order Management?API, the local cache manager notifies all the other cache managers to flush the reference data table. There is a small time-lag between when the reference data is changed to when the last cache manager is notified.

When the cache managers receive the change notification, the cache managers flushes all the cached entries for the affected table. As a result, you should cache tables that are infrequently changed."

Use session store to help facilitate eCommerce shopping carts, store user cookies, maintain user login and session state data, and enable IoT telemetry.

12. Design Alternatives Considered with Pros/Cons and Costing: A table of alternatives and pros/cons is required to make an unbiased decision

13. If you build it, you run it.

In the new paradigm of cloud based developedment the design paradigm shifts to design, build, run, maintain. Here are the key areas to be addressed.

Codebase?- use version control, one codebase tracked in revision control for many deployments. [5]
Dependencies?- use a package manager and don't commit dependencies in the codebase repository.
Config?- store the config in Environment Variable, if you have to repackage your application, you're doing it wrong.
Backing Services?- a?deploy?of the twelve-factor app should be able to swap out a local MySQL database with one managed by a third party (such as?Amazon RDS) without any changes to the app’s code.
Build, Release, Run?- the twelve-factor app uses strict separation between the build, release, and run stages. Every release should always have a unique release ID and releases should allow rollback.
Processes?- execute the app as one or more stateless processes, the Twelve-factor processes are stateless and?share-nothing.
Port Binding?- export services via port binding, The twelve-factor app is completely self-contained.
Concurrency?- scale out via the process model. Each process should be individually scaled, with Factor 6 (Stateless), it is easy to scale the services.
Disposability?- maximize robustness with fast startup and graceful shutdown, we can achieve this with containers.
Dev/Prod Parity?- Keep development, staging, and production as similar as possible, the twelve-factor app is designed for?continuous deployment?by keeping the gap between development and production small.
Logs?- treat logs as event streams, a twelve-factor app never concerns itself with routing or storage of its output stream.
Admin Processes?- run admin/management tasks as one-off processes.

Conclusion:

Systems design for modern applications has more moving parts than ever before. A diligent approach will yield the positive results and a system that will scale, be fault tolerant, and perform per users expectations despite failures of individual nodes and peak loads such as holiday season or some catastrophe.

References:

https://pages.github.ibm.com/w3ds/w3ds/?path=/story/vue_navigation-top--standard
https://www.ibm.com/cloud/load-balancer
https://learn.microsoft.com/en-us/training/modules/intro-to-azure-cache-for-redis/2-what-is-azure-cache-for-redis
https://developers.redhat.com/blog/2018/06/28/why-kubernetes-is-the-new-application-server#empowering_your_application
https://developers.redhat.com/blog/2017/06/22/12-factors-to-cloud-success
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#what-is-the-horizontal-pod-autoscaler

要查看或添加评论，请登录

Ramesh Yerramsetti的更多文章

Bharat (formerly known as India) Rising

2024年10月8日

Bharat (formerly known as India) Rising

According to the latest data from the National Sample Survey Office (NSSO) and the World Bank, poverty in India has…

1 条评论
Setting the stage for Inclusive Artificial Intelligence

2024年9月27日

Setting the stage for Inclusive Artificial Intelligence

The incredible contributions of women in AI are often overshadowed by their male colleagues. Overlooked Achievements:…
Forget English Language Conventions: Language is What LLM Generates!

2024年8月8日

Forget English Language Conventions: Language is What LLM Generates!

The advent of Large Language Models (LLMs) has revolutionized the way we think about language. For centuries, linguists…
AI goes beyond predicting next item in text to preliminary skills based 'consciousness'

2024年8月2日

AI goes beyond predicting next item in text to preliminary skills based 'consciousness'

Large language models (LLMs) have taken the world by storm, captivating us with their ability to generate human-quality…
IBM watsonx.ai Generative AI drives API generation

2024年7月31日

IBM watsonx.ai Generative AI drives API generation

IBM's AI Gateway, a feature of IBM API Connect, is designed to offer organizations a centralized point of control for…
IBM watsonx.governance across clouds and across third party LLMs, SLMs

2024年7月11日

IBM watsonx.governance across clouds and across third party LLMs, SLMs

Introduction: A client said recently "there are too many AI tools in the market, not sure how we can make the best use…
Digital Sobriety and the need for energy consumption monitoring of a AI Large Language Models (LLMs)

2024年5月13日

Digital Sobriety and the need for energy consumption monitoring of a AI Large Language Models (LLMs)

The Green Crusade: Measuring Code Sustainability and the Rise of electricity consumption due to AI Development. "The…
Subtract and Automate - applying Lean Six Sigma and Elon Musk's model with IBM watsonx AI toolset

2024年5月2日

Subtract and Automate - applying Lean Six Sigma and Elon Musk's model with IBM watsonx AI toolset

Many companies are aware of Elon Musk's approach to process improvement which Musk developed for his operations in…

2 条评论
Emulating a System on Chip(SoC): Verification and Validation

2024年3月22日

Emulating a System on Chip(SoC): Verification and Validation

The ever-growing complexity of System on Chips (SoCs) necessitates robust verification and validation methodologies to…
The Generative AI, Hype to Reality

2024年3月20日

The Generative AI, Hype to Reality

Abstract: Generative AI, the technology that allows machines to create entirely new content – from realistic images to…

2 条评论

See all articles

Disciplined system design for modern applications

Ramesh Yerramsetti

领英推荐

Conclusion:

References:

Ramesh Yerramsetti的更多文章

社区洞察

其他会员也浏览了

Consolidating Services for Enhanced Data Integrity and Consistency

Microservice

Maximizing Kubernetes Potential: Unveiling the Power of Sidecar, Adapter, and Ambassador Patterns in Multi-Container Pod Design

Reactivity use cases.

Why software layers are a pre-requisite for a scalable application security model for moving to distributed architecture?

Monolith Decomposition: Start the Ball (of Mud) Rolling

Mostly Used Distributed System Patterns

Micro-Services Orchestration and Distributed Applications Resiliency

8 Arbitrages of System design

A structured approach to managing a monolith - my experience of decomposing a big monolithic application

领英推荐

Conclusion:

References:

Ramesh Yerramsetti的更多文章

Bharat (formerly known as India) Rising

Setting the stage for Inclusive Artificial Intelligence

Forget English Language Conventions: Language is What LLM Generates!

AI goes beyond predicting next item in text to preliminary skills based 'consciousness'

IBM watsonx.ai Generative AI drives API generation

IBM watsonx.governance across clouds and across third party LLMs, SLMs

Digital Sobriety and the need for energy consumption monitoring of a AI Large Language Models (LLMs)

Subtract and Automate - applying Lean Six Sigma and Elon Musk's model with IBM watsonx AI toolset

Emulating a System on Chip(SoC): Verification and Validation

The Generative AI, Hype to Reality

社区洞察

其他会员也浏览了

Consolidating Services for Enhanced Data Integrity and Consistency

Microservice

Maximizing Kubernetes Potential: Unveiling the Power of Sidecar, Adapter, and Ambassador Patterns in Multi-Container Pod Design

Reactivity use cases.

Why software layers are a pre-requisite for a scalable application security model for moving to distributed architecture?

Monolith Decomposition: Start the Ball (of Mud) Rolling

Mostly Used Distributed System Patterns

Micro-Services Orchestration and Distributed Applications Resiliency

8 Arbitrages of System design

A structured approach to managing a monolith - my experience of decomposing a big monolithic application