Betso88 app download ios.Makakuha ng libreng 700pho sa bawat deposito

Most companies start with a minimum viable product to get the first clients, secure investments, and build the team and the necessary processes. But once this first phase is over, it turns out that:

The product does not perform well under load or is simply slow.
The product does not scale when new clients start using it simultaneously.
Parts of the product keep failing, causing downtimes, production issues, and clients' frustrations.
New, more prominent, and established clients demand high availability and performance SLAs.
More sophisticated architectures uncover knowledge and skills gaps that are hard to fill within the existing team.

This article focuses on workable approaches to finding the right time and resources to move the product forward without creating unnecessary risks.

Usual development and operational processes

Most companies I worked in or with have some development plan focusing on the functional features. Sometimes, it even includes adding a new architectural component, be it security, networking, database, or DevOps-oriented.

The plan includes functional testing if the company has a QA team. Sometimes, the client also wants integration or user acceptance tests to follow. The missing parts are almost always the most critical architectural qualities:

Performance - speed of responses or processing within defined service level expectations and agreements.
High Availability - the ability to continue providing the services when critical application and infrastructure pieces fail.
Reliability - the ability to provide trustworthy service even when some internal components fail or produce errors.

Use Case - SaaS service for Order Processing

To illustrate, let's take a practical use case - a software-as-a-service product built to automate order processing.

Expected functionality:

authorized clients management
external logistics integrations
actual order processing
reporting

Development teams would start developing the functionality. For MVP, a decision might be taken to host the service on AWS and create using Java for the backend and Angular for the front end.

Architectural decisions would include a single relational database like self-managed PostgreSQL or AWS RDS for Postgres, managed by the Cloud provider.

To run faster with the MVP, a monolith REST API service might be developed. Similarly, a monolith Angular project is undertaken.

Issues are handled with some logging, which each developer decides by himself what, when, and where to write, if at all. Probably some primitive form of system health monitoring is introduced to detect if a service is down or still running.

All seems to be fine until 1) production issues start coming in, taking a long toll on the dev team to identify the root cause, fix it, and release it back to the wild, 2) the team grows, and it becomes harder and harder to manage releases of the new software 3) the system becomes painfully slow, especially when a larger bunch of orders come, external systems slow down, heavy reports are run during orders processing.

At this moment, everyone starts to understand that something must be done and soon. But when to go into it, how do we know we're doing the right thing, how do we prioritize the changes, and who can make all these decisions?

Sounds all too familiar? I've encountered this in multiple projects and companies. And it is always challenging, complicated, and involved.

Let there be light - putting the order into chaos

Brainstorming session

Again, from experience, the number one action that could be undertaken in the company is to sit and discuss with the relevant group of knowledge- and stakeholders, asking questions about the current problems the product is facing. The main three categories are outlined below.

System slowness identified in development, testing, and production environments. This can include a) interactive actions, b) API invocations, c) d) events processing, d) background processes, and e) related monitoring and alerting on such conditions.

Potential sources of slowness:

Insufficient resources, especially CPU, are allocated to virtual machines or pods. These include those where application services run on virtual machines or Kubernetes pods, as well as application platforms, including databases, distributed caches, messaging services, service registries, security-related services, and so on.
The application is built without scaling in mind. For instance, it has one of multiple threads, which cannot scale to additional machines.
Kubernetes scaling and auto-scaling are not applied, resulting in the system being unable to withstand the load or bursts.
Database queries are not optimized, or there are not enough indices to speed up the queries.
Queries try to encompass big data without sharding or other big data processing strategies.
Network latency between multiple services or data stores slows down the processing.
Message queues in event-driven systems are not consumed fast enough, and consumers are slow and not scaling well.
As part of processing, external services are used without taking network latency into account.
At the same time as critical processes run, the same application and platform resources are consumed by heavy interactive actions, like big reports generation. A typical example is a heavy report running on the same database where many users use the interactive application.
The product could be built with too many small microservices talking to each other over secure JSON/HTTP interfaces without considering the overhead such conversations create when not done correctly.

Availability (up/down) of the product to its components. Here, mainly reported issues from all the environments must be collected, counted, and prioritized. Finding out when the product as a whole or its parts were identified as being down, non-responsive, timing out, etc. This can also include platform and infrastructure components.

Here, we pay attention to application, platform, or infrastructure components becoming unavailable or going down. These include

application crashes due to bugs or missing validations
databases going down or becoming extremely slow due to irresponsible queries
not using platform high availability clusters
network loss or timeouts
security restrictions due to misconfiguration
external services becoming unavailable due to networking or reasons not controlled by the company
whole data centers going down in planned or unplanned fashion
etc.

Reliability of the system (is up by keeps failing), differing from general unavailability by the fact that nothing seems to be down, yet the service keeps failing, returning errors, being very slow, timing out, and so on.

Here, it is essential to analyze which flows can fail and which must succeed, which can be safely retried, and which must be attempted only once. This is followed by thinking about possible policies of automatic idempotent retries at different levels, clear logging, intelligent state management, etc. Notably, to complete this analysis, additional knowledge could be needed as to what can be done to manage the chaos.

As a crucial part of this analysis, for each bullet above, an analysis followed by brainstorming can be performed to see if it was easy to identify the issue, perform root cause analysis, outline how to reproduce, etc. Together with proposing and documenting potential improvements.

Observability tooling - the missing ingredient

I must warn that unless the company is already well armed with a good set of observability tools and well-tracked bugs and production issues, only then would it have objective facts and numbers to support the brainstorming sessions above.

Because, if not, then mostly opinions and not largely substantiated claims "in my experience," "I think that, ""it is apparent that", and "everyone knows that", etc would be widely used, resulting in a biased plan of action, not necessary focusing on the most pressing and possible to solve issues.

What tools are needed:

Central log collection repository, like Elastic Search (see also some alternatives).
Metrics collection and presentation, like Grafana over Prometheus (some alternatives).
Distributed tracing to identify the flow of the calls, like OpenTelemetry with Jaeger (and some alternatives).

In general, the concept of observability is quite large and critical to understand. Below are two videos I found helpful for this purpose.

Only with these tools, providing objective metrics, logs, and traces, would the team be able to identify the exact sources of the problem and create an actionable plan to improve or eliminate those.

Putting together a plan

At this point, and I can't stress it enough, it is super critical to put the necessary tasks around the prioritized architectural improvements into the plan, with their assigned owners, resources, and timeline.

The tasks list can also include more fluid items, like:

learning and research;
consultations with internal or external knowledge holders;
proofs of concepts;
installation of the necessary tooling;
provisioning of environments;
running tests before and after;
and so on.

Critical warning

A bit more of advice and warning: often, such architectural improvement efforts are taken sporadically, some work is done, and then again, everyone is moved to feature development of dealing with production issues, forgetting or postponing everything for better times.

This leads to lost effort, frustrations, product not evolving, production issues piling up, good people deciding to leave, and more...

It is vital to realize that architectural improvement is an iterative process, where research, proofs of concepts, developments, testing, and delivery are done time after time, cycle after cycle. And they should be planned this way. Iteratively, from start to finish, in small doses, but constantly.

This is the only viable way to keep evolving the product, reduce technical, specifically architectural, debt, take on bigger and better clients, attract better talents, minimize frustration, and raise pride in the software the company delivers.

That's why I suggest appointing a person who is responsible and has enough authority to drive architectural tasks, making room for them in the company plans, involving the relevant internal and external specialists, and periodically presenting results to the stakeholders.

This can be done by:

the CTO himself (responsible for the company technology)
Enterprise Architect (connecting software and the business)
Solution Architect (dealing with software serving clients needs)
Software Architect (architecting a software product) or
Tech Lead (responsible for the service the team he or she is in delivers).

About the Author

Alexander Stern has more than 25 years of software engineering experience while working closely with C-level executives to ensure adherence to business needs, vision, and strategy.

He is available for short-term?Architect-as-a-Service?consultations to help businesses make their next evolutionary leap, avoiding pitfalls and taking deliberate, precise steps.

He can be contacted via email at [email protected] or by text message (WhatsApp/Telegram) at +372 56815512

How and when to introduce architectural changes amid urgent development and production issues?

Alexander Stern

Architect | CTO | Consultant

Usual development and operational processes

Use Case - SaaS service for Order Processing

Let there be light - putting the order into chaos

Brainstorming session

领英推荐

Observability tooling - the missing ingredient

Putting together a plan

About the Author

Architectural Changes

376 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Architecture Patterns: The Technical Foundation of Architectural Decisions

Introduction To MACH Architecture

A Practical Beginner’s Guide to MACH Architecture: Microservices, API-First, Cloud-Native, Headless

System Design Basics: API Gateway

Monolithic vs. Service-Oriented vs. Microservice Architecture: Top Architectural Design Patterns

Unthinkable Digital Platform Architecture Framework

Navigating Software Architecture

Microservices vs. Monolithic Architecture: Which One Is Right for Your Application?

Why should you consider Headless Architectures as important for your Enterprise?

What are the key architectural patterns for faster time-to-market, scalability, and agility in modern business application development?

Usual development and operational processes

Use Case - SaaS service for Order Processing

Let there be light - putting the order into chaos

Brainstorming session

领英推荐

Observability tooling - the missing ingredient

Putting together a plan

About the Author

Architectural Changes

376 位关注者

How to improve software teams performance?

2024年10月22日

Kubernetes Transition Traps: Top 5 Missteps to Avoid

2024年1月19日

Ten design principles for Cloud applications

2023年9月24日

What Type of Software Architecture is Right for Our Company?

2023年5月7日

"We Must Improve Our System Performance - What Should We Do?"

2023年3月14日

How to Safely Introduce Architectural Changes?

2023年2月26日

Got Inspired by an Amazing Business Conference? How To Implement New Ideas and Bring Them to Life.

2016年10月8日

How to Get Employees Engaged? Program for Engaged and Effective RnD

2015年7月24日

How to get cooperation from I-Know-All employees?

2014年7月30日

Main two problems with decision making in R&D

2014年7月21日

社区洞察

其他会员也浏览了

Architecture Patterns: The Technical Foundation of Architectural Decisions

Introduction To MACH Architecture

A Practical Beginner’s Guide to MACH Architecture: Microservices, API-First, Cloud-Native, Headless

System Design Basics: API Gateway

Monolithic vs. Service-Oriented vs. Microservice Architecture: Top Architectural Design Patterns

Unthinkable Digital Platform Architecture Framework

Navigating Software Architecture

Microservices vs. Monolithic Architecture: Which One Is Right for Your Application?

Why should you consider Headless Architectures as important for your Enterprise?

What are the key architectural patterns for faster time-to-market, scalability, and agility in modern business application development?