登录查看更多内容

What’s the Difference Between Fault Tolerance and High-Availability

Joshua Cefai-Cox

Team Lead/Senior Account Executive, Mid-Market Sales at Equinix

发布日期: 2021年2月8日

Fault tolerance and high-availability

Fault tolerance and high-availability are two terms that are often used interchangeably in IT circles. The truth is, though, that there are several important distinctions between a fault-tolerant system and a high-availability system. If you are considering upgrading to one of these two systems, it’s important to understand the unique advantages that both systems offer.

What is Fault Tolerance?

Like high-availability, fault tolerance is designed to minimise downtime. However, the methods used to minimise downtime in a fault-tolerant system differ from those used by a high-availability system. In the end, a fault-tolerant system is designed to enable the system to continue operating even if one of its components goes down.

There are several different methods of fault tolerance that you will want to be aware of. These methods include:

Triple Modular Redundancy

In a triple modular redundancy fault-tolerant system, redundancy is achieved by having three different systems set up to perform the same process. The results that these systems produce are then checked by a majority voting system, which then produces a single output. In the event that one of the three systems fails, a correct output can still be generated since the other two systems will still provide a correct output to the majority voting system.

Forward Error Correction

Forward error correction involves adding redundancies directly to the message that a system sends out rather than the adding redundancies to the system itself. By adding redundancies within the message itself, the receiver is able to verify the data and correct certain errors that are caused by unstable or noisy channels.

Checkpointing

Checkpointing is one of the most common methods of fault tolerance and is used regularly in common applications such as word processors. This method involves automatically saving data periodically so that the system can be restarted back to its saved state in the event of a crash. While checkpointing may seem simple enough, it can actually become a complicated process when you are backing up data on whole distributed systems. However, there are a number of solutions such as Distributed MultiThreading CheckPointing that simplify the process and allow you to checkpoint the status of multiple distributed systems.

Byzantine Fault-Tolerance

Byzantine fault-tolerance is essentially a combination of all the above methods. This multi-faceted approach to fault tolerance is designed to deal with situations where the majority of your system’s monitoring modules are not able to reach a consensus on what a given output should be. There are numerous solutions that Byzantine fault-tolerance relies on an order to address this problem. For now, though, suffice it to say that Byzantine fault-tolerance is the most comprehensive approach that you will have available when building a fault-tolerant system.

What’s the Difference Between Fault Tolerance and High-Availability?

While high-availability systems and fault-tolerant systems are both designed to accomplish basically the same objective, there are a number of important distinctions between the two approaches. One key difference is that high-availability systems are designed to both limit downtime as well as keep the performance of the system from being negatively affected. With a fault-tolerant system, downtime is still limited, but maintaining performance isn’t as much of a priority.

While this makes it sound as if high-availability systems have a clear advantage, there is an important benefit to fault tolerance that must be taken into account as well. If an error occurs during an active action in a fault-tolerant system, the correct end state of that action will still be outputted. This is not the case with a high-availability system.

For example, if a user submits a request to your website that is hosted on a high-availability platform and a node crashes, the user will be given a 500 error message. However, the system will still remain operational and will be able to respond to new requests. With a fault-tolerant system, though, the failure is worked around and a valid response is still displayed to the user – though it might be delayed. This is the most important distinction between high-availability and fault tolerance that you will want to keep in mind when deciding which system is best for your organisation.

Conclusion

Both high-availability systems and fault-tolerant systems excel at preventing downtime and ensuring that single failures don’t crash the entire system. In the end, whether high-availability or fault tolerance is the right choice for your organisation comes down to your specific priorities and requirements.

If you would like to learn more about creating either a fault-tolerant or high-availability system, we invite you to contact us today. At Servers Australia, we are dedicated to helping organisations of all sizes eliminate downtime through effective technological solutions such as fault tolerance and high-availability, and we would be happy to work with you to help you develop the perfect approach for your specific organisation. Servers Australia is an Enterprise Partner with VMware which has allowed us to deliver industry-leading Fault Tolerance and High Availability solutions throughout our Data Centres

要查看或添加评论，请登录

Joshua Cefai-Cox的更多文章

Data Centre Migration

2021年7月20日

Data Centre Migration

How to migrate your Data Centre One of the few constants in business is change. Companies change ownership, merge…
Minimising Server Downtime

2021年2月1日

Minimising Server Downtime

Have you ever thought about or maybe even experienced the significant impact that server downtime can have on your…
Equinix - Know your Data Centre

2021年1月18日

Equinix - Know your Data Centre

Servers Australia is proud to have partnered with Equinix? who have spent over the last 20 years becoming one of the…
A Simple Guide to Colocation

2021年1月11日

A Simple Guide to Colocation

As more businesses are closing down their offices to reduce costs and more employees are working from home; you would…
2021 Market Trends

2021年1月5日

2021 Market Trends

Since 2020, there has been a need in the market to shift existing business strategies due to the COVID-19 pandemic…
CapEx vs OpEx

2021年1月4日

CapEx vs OpEx

Reducing both CapEx and OpEx is often on the foremost goals of organisations regarding their IT department. In case you…
Tips to Keep Your Business Safe from Ransomware Attacks

2020年12月7日

Tips to Keep Your Business Safe from Ransomware Attacks

Ever since the famous ‘WannaCry’ ransomware attacks started affecting millions of computers connected to the World Wide…

1 条评论
5 benefits of moving to Cloud Servers

2020年11月26日

5 benefits of moving to Cloud Servers

What if you found a simple way to cut costs, take advantage of economies of scale, and focus on your core business? By…
IaaS vs DRaaS vs PaaS – What is Right for Your Organisation?

2020年11月4日

IaaS vs DRaaS vs PaaS – What is Right for Your Organisation?

When it comes to outsourced IT solutions, businesses and organisations have several beneficial options to choose from…
What are Backups and Why Does Your Business Need it?

2020年10月26日

What are Backups and Why Does Your Business Need it?

It goes without saying that businesses spend a lot of time and money creating and acquiring their valuable data. Still,…

See all articles

What’s the Difference Between Fault Tolerance and High-Availability

Joshua Cefai-Cox

Team Lead/Senior Account Executive, Mid-Market Sales at Equinix

Fault tolerance and high-availability

What is Fault Tolerance?

Triple Modular Redundancy

Forward Error Correction

Checkpointing

Byzantine Fault-Tolerance

What’s the Difference Between Fault Tolerance and High-Availability?

Conclusion

Joshua Cefai-Cox的更多文章

社区洞察

其他会员也浏览了

Realtime Replication for Industrial Historians

Happy Birthday, Veritas!

Byzantine Fault Tolerance Simplified, A Brief Overview On Its Types

Revolutionizing the Network Operations Center (NOC) with Agentic RAG (Retrieval-Augmented Generation)

How to Fix 429 Too Many Requests Error Code?

?? Kernel Panic: A Deep Dive into a Critical System State ??

Diagnosing Modbus messaging application, Ethernet TCP/IP based interface

Make your performance tests more relevant with Think Time

Wishing for Quick Recovery Times!

Fault tolerance and high-availability

What is Fault Tolerance?

Triple Modular Redundancy

Forward Error Correction

Checkpointing

Byzantine Fault-Tolerance

What’s the Difference Between Fault Tolerance and High-Availability?

Conclusion

Joshua Cefai-Cox的更多文章

Data Centre Migration

Minimising Server Downtime

Equinix - Know your Data Centre

A Simple Guide to Colocation

2021 Market Trends

CapEx vs OpEx

Tips to Keep Your Business Safe from Ransomware Attacks

5 benefits of moving to Cloud Servers

IaaS vs DRaaS vs PaaS – What is Right for Your Organisation?

What are Backups and Why Does Your Business Need it?

社区洞察

其他会员也浏览了

Realtime Replication for Industrial Historians

Happy Birthday, Veritas!

Byzantine Fault Tolerance Simplified, A Brief Overview On Its Types

Revolutionizing the Network Operations Center (NOC) with Agentic RAG (Retrieval-Augmented Generation)

How to Fix 429 Too Many Requests Error Code?

?? Kernel Panic: A Deep Dive into a Critical System State ??

Diagnosing Modbus messaging application, Ethernet TCP/IP based interface

Make your performance tests more relevant with Think Time

Wishing for Quick Recovery Times!