登录查看更多内容

How Do We Keep an Airplane in the Air 24/7 While We Continue to Upgrade?

Virendra Parmar

Secure peace by designing your personalized 360° goal-based financial plan

发布日期: 2024年6月11日

When working on developing a SaaS service, I realised how closely it relates to my initial experience in telecom, where all our systems required 99.999% availability. The requirement in layperson's terms: we need to ensure the aeroplane (our service) stays in the air 24/7, 365 days a year, even during updates or upgrades. This aeroplane carries millions of users, and any crash is unacceptable as it impacts all of them.

Illustration with Z****** Platform

To illustrate, let me share three defects I encountered over three months with the Z****** platform. While I use Z****** as an example, many organisations faced similar issues. ?

Rebalancing Baskets: I could not rebalance my Baskets for nearly six weeks due to ongoing issues. After extensive troubleshooting with Z******, it became apparent that the interaction between their software, K*** (sister company), and CDSL had a functional defect. Eventually, I was offered a workaround: authenticate with CDSL from K***, log out, and rebalance. It worked.
CDSL Platform Downtime: In another incident, the CDSL platform went down, and the workaround offered was to skip the authentication process.
R******* Transaction Delays: On June 4th and 5th, R******* did not report transactions on time, causing the software to fail in processing buy requests.

Each issue began with casual responses and eventually led to apologies, but customers had no recourse until the customer persisted. I estimate I lost over ?25,000, and I'm sure many others experienced similar losses.

Common Responses to Customer Issues

Having worked in similar roles for nearly three decades, I understand the typical responses from development teams:

"It happens in your environment only."
"It is a random issue; please try again later."
"Shut down and restart."
"Can you reproduce the issue and share logs and traces?"
"It is tough to resolve since we can't reproduce it."
"It happens under extreme load or once in a blue moon."
"It is an act of God."

Many can relate to these responses, which are frustrating and often unhelpful.

Recommendations for Organizations

While compensation might be too much to ask in a country where justice is often delayed, I recommend that organisations take the moral high ground and become more transparent. They should share details immediately, including:

What happened?
How long has the problem persisted?
Root cause analysis using the 5 "whys."
How many customers were impacted, and what was the likely amount of loss?
When the issue was discovered?
How the impacted customers were informed proactively?
What is the workaround until it is fixed?
What corrective actions are taken?
What preventive actions are planned?
Whether a fix is required, the timeline, and recommended actions until the issue is resolved.
Communicate with all customers about the recommended actions.

领英推荐

Contact Center Insights - December 2024

Hammer 1 个月前

Navigating the Shift to EPOTS in 2024

SmartChoice 1 年前

Don't Stand Still. Upgrade Your Legacy

PSP IT 2 年前

The same problem recurred the next day, and I am still trying to find an acceptable workaround.

Lesson in Accountability

This is my first public commentary on such issues on social networks. It aims to raise awareness in the software community about the importance of robust software design and engineering.

The biggest lesson I tried to impart to my child was this pattern for handling mistakes:

Accept.
Acknowledge.
Apologise.
Inform.
Correct.
Prevent.
Avoid repeating the same mistake.

While the first three steps are often satisfied sometimes by force, the rest are frequently neglected because there is blame on a third party.

Conclusion

I often find defects and have several real-life examples across many organisations. Over time, I have moved from a combative approach to a more empathetic approach toward those working in software companies, recognising their constraints.

This article highlights the critical need for better software development and customer support practices, ensuring that "aeroplanes" remain in the air without compromising user experience.

All SaaS service providers must follow the same approach as the “Air Crash Investigation” series, as their services are equally critical for the public.

PS:?This issue occurred for many platforms on the 4th and 5th of June; it is likely to happen in the future from my point of view since I don’t believe the problem is solved to the extent required.

要查看或添加评论，请登录

Virendra Parmar的更多文章

Rules are liberating

2024年6月7日

Rules are liberating

Rules are liberating I was watching a video by PK Narayanan on setting up a coaching business, and he made a statement…
The Power of an Extra 1% Daily – The Magic of Compounding Personal and Professional Growth

2024年4月11日

The Power of an Extra 1% Daily – The Magic of Compounding Personal and Professional Growth

Have you ever noticed these common patterns around you? ? A small group always pushes boundaries, while the majority…
My Journey with ChatGPT and AI: Simplifying the Future

2024年3月24日

My Journey with ChatGPT and AI: Simplifying the Future

Four years ago, I stumbled upon a revelation in the form of Grammarly, a tool that promised to revolutionise my…
DevOps and CI/CD in the Enterprise Grade Product Development

2023年12月9日

DevOps and CI/CD in the Enterprise Grade Product Development

The LinkedIn suggestion led me to write some notes on the subject, and later, I realized that I could not provide more…
Balancing Act – Customer Satisfaction v/s Scope Change Management

2022年8月6日

Balancing Act – Customer Satisfaction v/s Scope Change Management

Consider this scenario: Fresh out of school, an enthusiastic salesperson is trying to sell his first car. He has…
Email Etiquettes

2022年7月18日

Email Etiquettes

Although social media like Linked In and Facebook are popular, emails are still a predominant way of communication for…

See all articles

How Do We Keep an Airplane in the Air 24/7 While We Continue to Upgrade?

Virendra Parmar

Secure peace by designing your personalized 360° goal-based financial plan

领英推荐

Virendra Parmar的更多文章

社区洞察

其他会员也浏览了

StatusNeo Snapshot

What is the true cost of IT downtime?

Ask the prpl BoD: Brandon Kern, DISH

Cost Efficiency: The Financial and Strategic Advantage of Systems Modernization

Legacy IT: A technical debt field of dreams

Complexities Around Changing Telecommunications Providers & How a Telco Broker Can Assist

Blog Series: Cutting Through the Technical Debt: Strategies for Telecommunications Operators

Uptime Percentages, Recovery Time Objective and Error Budgets

Don't Be Caught Offline: The Rising Cost of Software Disruptions

Whatever Happened to Five 9s Reliability?

领英推荐

Virendra Parmar的更多文章

Rules are liberating

The Power of an Extra 1% Daily – The Magic of Compounding Personal and Professional Growth

My Journey with ChatGPT and AI: Simplifying the Future

DevOps and CI/CD in the Enterprise Grade Product Development

Balancing Act – Customer Satisfaction v/s Scope Change Management

Email Etiquettes

社区洞察

其他会员也浏览了

StatusNeo Snapshot

What is the true cost of IT downtime?

Ask the prpl BoD: Brandon Kern, DISH

Cost Efficiency: The Financial and Strategic Advantage of Systems Modernization

Legacy IT: A technical debt field of dreams

Complexities Around Changing Telecommunications Providers & How a Telco Broker Can Assist

Blog Series: Cutting Through the Technical Debt: Strategies for Telecommunications Operators

Uptime Percentages, Recovery Time Objective and Error Budgets

Don't Be Caught Offline: The Rising Cost of Software Disruptions

Whatever Happened to Five 9s Reliability?