Maximizing Your System’s Reliability with the Maintenance Metrics: Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR).

Maximizing Your System’s Reliability with the Maintenance Metrics: Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR).

Aging buildings, new technologies, and evolving user needs are driving unprecedented change in the real estate industry. Today’s Facility Managers are facing increasingly complex problems. And with it comes a tsunami of new challenges. Cities are growing faster than ever, with more pressure to reduce operational costs and improve environmental performance while maintaining services. When it comes to systems or equipment, failure is almost inevitable. Every process has its ups and downs, peaks and valleys. There will be some point when the process breaks down or stalls for some reason or another. The difficulty with most failures is that they tend to lead to more significant effects down the line.?

It is not uncommon for even the best-run facilities to fail because they operate under ‘fail-safe’ conditions. As a result, FM professionals are operating at the edge of their capacity, which means the risk is high. This article introduces the theory of failure-proofing and principles of mean time between failures and mean time to repair, the two important metrics to track and minimize downtime with productivity.

What are Maintenance Metrics?

Maintenance metrics are a set of measures that track the performance of maintenance activities. The process runs effectively and efficiently by looking at past data and monitoring statistics from the process, identifying potential issues, and making necessary corrections.?

?A small failure might trigger a cascade of other problems that ultimately leads to a much bigger issue further down the road. Fortunately, there are ways to combat this natural tendency toward cascading failures within the process and keep things running smoothly.

Understanding the value of Meantime Between Failures (MTBF) and Mean Time To Repair (MTTR).?

MTBF measures the average time a system or component fails or is prone to failures. The form of average is called the geometric average, which looks at the high and low points of the data and then averages them together to find the final result.

MTBF provides a general indication of how often an asset is likely to fail and compares the reliability of different assets or systems.?

MTTR is a critical measure for determining the effectiveness of a facility’s maintenance activities. The MTTR measures the maintenance operations’ efficacy, identifies trends in maintenance effectiveness, compares the effectiveness of different maintenance operations, and provides insight into how quickly systems recover from failures. MTTR calculates the average time taken to rectify a failure once it has occurred. The faster the MTTR, the less downtime and lost productivity businesses experience.?

MTBF and MTTR metrics track the same event and provide insight into the reliability of a system. The two metrics explain how the system works and how to make improvements and corrections where necessary.?

MTBF and MTTR can vary depending on the following factors:?

  • Design type of system or component, i.e., Poorly designed components or systems may be prone to failure or require frequent maintenance and repair.
  • The environment in which the system or component functions, such as temperature, humidity, and vibration.
  • Using the systems and components more frequently is likely to experience failures.

How to calculate MTBF and MTTR

I. MTBF:

  • The most common method is the facility’s total number of operating hours divided by the number of failures that occur during that time, giving an average number of hours between failures.
  • For example, if there were 4 breakdowns in a year on a machine, then MTBF would be approx. 90 days, which would mean you have a possibility of a failure within next 3 months.
  • A predictive model calculates MTBF, which uses past data to predict future failures.?

II. MTTR:

One of the most critical performance metrics for any organization is Mean Time to Repair or MTTR.?

  • This metric measures the average time it takes to fix a problem once detected.
  • Another way to calculate MTBF is to take the sum of the repair times for all failures that have occurred and divide it by the total number of failures, giving an average repair time for each failure.
  • To calculate MTTR, you first need to determine the total number of failures during a given period. You then divide that number by the total time that was taken to repair those failures. For example, let’s say there are 10 failures in a year, and total time taken to repair all failures was 24 hours. The calculation would look like this: 24hours/10 failures. On average, it would take 2.4 Hours to fix a problem once detected.

Keep in mind that MTTR is just an average, so some problems fix much faster while others may take longer. But overall, this metric can give you a good idea of how quickly your team can resolve issues.

It’s important to note that MTBF is a predictive metric, not a diagnostic one. That is, it identifies the probability of asset failure in the future but not why it failed in the past. The metrics look at other data sources to identify the root cause of past failures, such as maintenance records and incident reports.

Still, MTBF can be a valuable tool for deciding when to schedule maintenance, when to replace equipment, and how to allocate resources for reliability improvement efforts.

Monitoring and Analyzing data for maintenance matrix for improvement

Data monitoring and analysis are essential to improve the maintenance process. It requires proactive monitoring and analysis of data to identify weak points, trends, and correlations. By gathering data on the performance of systems, machines, and components, FM teams can identify areas for improvement and track the success over time.?

  • The first step in monitoring and analyzing data for improvement is to define and gather the data points, including the frequency of system failures, the causes of failures, and the time it takes to repair them.?
  • Other data points may include the number of parts and components replaced, the speed of the machines, and any other relevant metrics.?
  • By establishing the above data points, teams can start collecting them and analyzing the correlations between different data points, spotting trends over time and identifying weak points in the system.
  • Through this process, one can pinpoint the areas that need improvement and develop strategies to reduce the frequency of system failures.
  • In addition, data analysis also compares the performance of different systems or components to determine which are more reliable and which need improvement for systems to run as efficiently as possible.?
  • If the data shows a particular component is failing more frequently than expected, investigate and identify the root cause, ranging from a simple design flaw to a lack of maintenance or improper use. By addressing the root cause, the FM team can reduce the mean time between failures.

Best practices to maximize uptime

The best practices include properly maintained components, high-quality components, and testing procedures to maximize uptime.?

  • Ensure using the highest quality components in the system.?Using top-tier components is the best way to ensure the system will operate reliably and with minimal downtime. Use high-quality tested parts certified by an independent organization.?
  • Condition the environment in which the system operates.?The environment should be free from dust and other contaminants which could damage the system’s components. Keep the temperature and humidity within the acceptable range. Additionally, ventilate the environment, and avoid direct heat sources to the system to avoid overheating.
  • Schedule maintenance by regularly?inspecting, cleaning, and maintaining the system to check for worn or damaged parts and cleaning the system of dust and debris. It also includes replacing filters, lubricating moving parts, and inspecting wiring.
  • Ensure that the system is adequately powered.?The power supply should be stable and free from any spikes or fluctuations which could affect the system’s performance. The cables and connectors should also be checked regularly for any damage or faults and replaced if necessary.
  • Implement a preventive maintenance plan.?The preventive approach involves proactive maintenance and planned downtime to identify and fix any potential issues before they cause a failure. Include regular inspections, testing, cleaning, and maintenance of the system. The plan should also include a schedule for replacing worn or damaged components and a method for tracking the system’s performance, such as monitoring temperature, vibration, and other performance metrics.?
  • Use diagnostic testing?to detect potential problems with a system or component and identify any issues that may be present, allowing for quick and effective repairs before the system or component fails.
  • Use system-level monitoring and alerting software.?This software can help detect potential problems before they become an issue and help identify areas that need improvement.?
  • Use an effective system design or component?with an appropriate level of redundancy. Redundancy helps ensure that a component will continue operating even if one fails, allowing the system or component to remain operational for a longer time.
  • Include an incident response plan.?Have a well-developed incident response plan that outlines the steps to mitigate the failure and restore normal operations. The plan should include details such as whom to contact, what resources will be needed, and how to handle the situation. Additionally, it is essential to have a clear procedure for reporting, assessing, and documenting any issues that arise to prevent future failures quickly.
  • Implement training, skill development, and a proactive approach.?A robust spare parts system and a proactive approach to training and staff development can reduce downtime and improve the system’s overall reliability.

Technology solutions to maximize maintenance metrics

To succeed in this environment, we need a clear strategy and actionable advice on what to do when things go wrong. Digital transformation has significantly changed how people interact with businesses and services, forever transforming industries and business practices.

With the growing demand for digitally-enabled services at an unprecedented rate, more businesses are investing in new technologies to drive operational efficiency and reduce operational costs. These digital transformations require that organizations focus on reducing operating costs while increasing uptime, productivity, and customer satisfaction.?

The need for data and transparency has never been greater.

  • Predictive Maintenance:?Predictive maintenance monitors a system or component to predict when it will fail to alter before a system failure occurs, reducing the likelihood of downtime.
  • Condition Monitoring:?Condition monitoring is a technology that uses sensors to monitor the condition of a system or component to identify potential problems and address them before they cause a system failure.
  • Failure Analysis:?Failure analysis is a technology that uses data analysis to identify and fix potential problems before they cause a system failure maximizing MTBF and reducing the likelihood of downtime.
  • Several tools are available for monitoring and assessing MTBF and MTTR, including calculators and software. The calculators allow you to calculate the expected MTBF and MTTR of a system or component based on its design and environment. The software monitors and analyzes the performance of a system or component in real time.

In conclusion:

Monitor and test the system to ensure that the system is functioning optimally. MTBF and MTTR are essential metrics to track and offer insight into the effectiveness of your maintenance operations. Tracking these metrics allows you to identify areas where your maintenance operations can improve. By taking the time to evaluate the systems and the interactions thoroughly, it is possible to identify areas of improvement.?Additionally, conduct a failure mode and effects analysis (FMEA) and implement a test plan.?

Human error may contribute to potential failure points. If the system or product is in use by multiple people, design the interface to minimize the risk of error. Technology can be an invaluable tool when it comes to uptime. By utilizing the modern-day Integrated FM tool, organizations reduce the likelihood of breakdowns and improve the reliability of the systems. However, it requires knowledge of the system and its components and the ability to identify potential problems before they occur. Provide training to ensure that staff understands how to properly use the system or product to reduce the amount of downtime associated with maintenance and repairs, further improving reliability, durability, and safety.?

To learn more about Denali Assets Real Estate Consulting, Integrated Facility and Property Management, and Technology Solutions for SINGLE-POINT tech-integration, please write to us at [email protected] or visit us at www.denaliassets.co.in?

Kevin Brain Ostos Julca

Msc. Eng. Asset Management I Gestión de Activos y Confiabilidad Operacional | Gestión estratégica del mantenimiento I Lean Six Sigma Black Belt I IAM I Gerencia de Proyectos I Ciencia de Datos

1 年

This article by Denali Assets provides a comprehensive and insightful overview of the crucial maintenance metrics, Mean Time Between Failures (MTBF) and Mean Time To Repair (MTTR), in the context of the real estate industry. In a world where infrastructure and technology are continually evolving, understanding these metrics is essential for Facility Managers facing the challenges of aging buildings and increasing user demands. The article rightly points out that failure is nearly inevitable in any system or equipment, and it often has cascading effects if not addressed promptly. The explanation of MTBF and MTTR is particularly valuable. MTBF, as the average time between failures, offers a practical measure of an asset's reliability. It enables comparisons between different assets or systems, helping Facility Managers make informed decisions about maintenance and resource allocation. MTTR, on the other hand, emphasizes the importance of swiftly addressing failures. As the average time taken to repair a failure, it directly impacts downtime and productivity. The article's emphasis on faster MTTR leading to less business disruption underscores its significance.

要查看或添加评论,请登录

Denali Assets的更多文章