SIF MISSION TIME IS UP! NOW WHAT?
Shaun Williamson P.L. Eng., CFSE, PMP
Director of Engineering - Supporting our clients with HAZOP/LOPA, SIL/SIS consulting, Fire & Gas Engineering, Alarm Management, Bowtie.
Many Safety Instrumented Functions (SIFs) installed early in the adoption of IEC 61511 have either reached their mission time or are soon to expire. Companies now face decisions on how to handle this. The primary question is often, "Do we need to replace the SIF hardware, and what is the risk if we don't?" This article delves into why mission time is crucial for achieving high reliability and the consequences of ignoring expired SIF mission time.
Importance of SIF Mission Time
SIF Mission Time is vital when designing a Safety Instrumented Function. Failure to bring SIF elements to an 'as new' condition when their mission time expires can result in the SIF no longer meeting its designed integrity. This can lead to regulatory non-compliance when a specific Safety Integrity Level (SIL) is required for regulatory reasons. It can also mean operating outside the acceptable risk threshold if the SIF is meant to mitigate a specific risk. Thus, mission time is a key factor in achieving the target reliability.
The Relationship Between Mission Time and Useful Life
Mission Time refers to the period between when the SIF (or device) is put into service and when it is replaced or refurbished to "as-new" condition. When selecting a Mission Time for SIF elements as part of reliability modeling, it is essential to consider its relationship to Useful Life.
Useful Life and the Bathtub Curve
The following image of a bathtub curve is used to illustrate the importance of considering useful life. ?This graph and others like it below show the calculated failure rates over time for typical equipment. ?
The bathtub curve illustrates what the typical Useful Life curve looks like for most equipment. Hardware components typically have two periods of high failure rates: at the start of their life (infant mortality) and at the wear-out phase. Effective quality control, such as burn-in tests and commissioning checks, can address infant mortality. Wear-out failures can be managed by removing hardware from service before they occur. The relatively flat failure rate in the middle of the curve represents useful life, during which failure rate data used in reliability calculations are valid. SIF hardware must be maintained within this useful life for accurate SIF modeling. Exceeding useful life risks exponential increases in failure rates, entering the wear-out phase. Therefore, mission time must not exceed useful life for IEC 61511 compliance. It's also important to note that useful life can be application-specific, considering process and atmospheric conditions, as vendor literature might be based on ideal conditions, not harsh environments.
Mission Time Considerations
Mission Time is a calculated value to determine when SIF hardware should be restored to 'as new' condition. This value can be reduced to achieve a higher SIL due to a lower probability of failure. Useful Life is the maximum value that can be used for Mission Time.
Probability of Failure Over Time
Although the graphs of PFDavg over time are useful to visualize how the reliability degrades with time, you never know what the actual PFD or potential of failure of a specific SIF element. The first time you know that it’s not working correctly is a failed proof test (and proof testing never catches 100% of failure modes), or when it fails to prevent an incident. ?If your equipment is past its Mission Time, you have no idea what risk you’re operating at – but it’s certainly higher risk than you planned for!?
The image above represents equipment probability of failure over time without considering SIF Proof Testing which can reveal hidden failures before they occur, or an established Mission Time which allows the PFDavg to reset to zero. This graph shows that all equipment will eventually fail, and this probability increases over time. ?The red line represent the probability of failure on demand, while the dashed black line represents an average probability on demand (PFDavg). ?A lower PFDavg is represents a higher overall reliability.
Benefit of Regular Proof Testing
?In contrast to the i above, the following image shows the benefit of regular proof testing with a significantly lower average Probability of Failure on Demand (PFDavg). The sawtooth pattern shows failure probability dropping at each test. How much it drops is a function of test coverage. ?Since no test is perfect, there will remain a certain portion of potential failures that remain undetected. ?Once the test is complete, the failure probability continues to climb until the next test. ?This pattern continues until the SIF is removed from service, replaced or repaired to 'as new condition' at the end of the mission time. ?Regular proof testing keeps the PFDavg low. ?More frequent testing or higher test coverage can further reduce PFDavg, enhancing the SIF reliability.?
领英推荐
Reducing Mission Time for Better SIL Performance
The image below highlights the benefits of reduced mission time to lower PFDavg for a higher achieved SIL. The intent is to bring SIF components to 'as new' condition. For non-repairable equipment, this means replacement; for repairable equipment, it requires rebuilding all wearable components. The remaining components must also be in 'as new' condition. If achieved, the PFD resets to zero, starting a new cycle. This approach can reduce proof test requirements or avoid the need for additional redundant hardware. ?
Consequences of Exceeding Mission Time and Useful Life
If SIF hardware is not restored to 'as new' condition when its mission time expires, its reliability degrades, and it fails to meet the SIL target. This non-compliance with IEC 61511 poses a regulatory risk, especially if the SIF was implemented for regulatory reasons. As time passes and the useful life is exceeded, the risk of failure increases exponentially. Safety Instrumented Functions are typically implemented to prevent severe consequences, such as toxic releases or explosions. Failure to maintain the SIL increases the risk of these events occurring without the intended protection.
Proactive Replacement of SIF Hardware
Replacing SIF hardware before the end of useful life avoids running the SIF in the wear-out phase. Safety Instrumented Functions are not intended to use a run-to-failure philosophy, unlike normal process control loops. Safety functions must work when all other protection layers have failed, making their availability crucial. Therefore, bringing SIF hardware to 'as new' condition at mission time expiry is necessary to ensure reliability.?
Examples from Everyday Safety Equipment
Other critical safety equipment, like smoke detectors and fire extinguishers, are replaced before their expiry to ensure functionality. Smoke detectors, for instance, are replaced every 10 years to guarantee reliability. Fire extinguishers are also replaced upon expiry to ensure effectiveness. Similarly, Safety Instrumented Systems, designed to protect people and the public, must adhere to established mission times to provide the intended protection.
Managing Delayed Upgrades
While immediate rebuild or replacement is ideal, practical constraints such as budget and resource availability may delay this process. In such cases, prioritizing which SIFs to address first based on risk is essential. SIF modeling can be used to help evaluate the status of all existing SIFs, prioritizing upgrades based on the highest risk. This approach helps in staging upgrades and managing budgets effectively.
For SIF hardware that is not prioritized for immediate upgrade, several interim measures can be taken to maximize reliability and mitigate risk:
These actions aim to maintain reliability as close to the needed SIL as possible until full upgrades can be accomplished. Utilizing SIL modeling can also help communicate the impact of delaying replacements, providing clarity to management on how delays affect reliability.
Summary
Adhering to mission time requirements is essential for maintaining the reliability and safety of Safety Instrumented Functions. Failing to address these requirements can lead to regulatory non-compliance and increased risks, undermining the safety measures intended to protect people and processes. A staged approach based on a risk assessment is a good way to prioritize the work, and alternate measures can be tested using SIF modeling to maximize reliability while in transition.
Check out our website for more content or to reach out for support: https://watchmenise.com/articles
Chemical engineer, Process engineer, Functional Safety Engineer with passion for improvement and safety.
2 个月Sriram Ramalingam, CFSP, PMP? I think part of the issue is many companies mix the definition of types of test. Full function test sometimes comprehensive test but sometimes used as overhaul test which make the device a brand new device. If full function test achieve more than 95~97% coverage, mission time might be negligible. (This simply depends on ratio of mission time to proof test frequency). If full function test coverage reduces mission time dominate the calculation. I've seen mission time was neglected when too optimistic full function test coverage is used which is not a good practice as too optimistic test coverage was used. * I couldn't reply under your comment.
Lead Instrument & Control Engineer, WHP/ CPP/ FPSO for Oil & Gas industry
4 个月Very informative, thank you Author!
Experienced Instrumentation & Control Specialist with proven expertise in Design, Engineering, Testing, Commissioning, Start-up, Operation and Maintenance of I&C assets in process industry.
5 个月When the assumption of a 'full functional test' is made, the frequency of that alone (test interval) is considered for PFDavg calculations, the mission time is no longer a factor. When it comes to partial proof tests however, the test coverage and undetected failures along with mission time becomes important. I have come across SIL verification reports that never consider mission time, i.e (considered as never). There are general guidelines I know that specify to add a severity factor to the failure rates effectively de-rating it for the respective service conditions. I would like to know if any such recommendations exist for Mission time as well - like you mentioned, given that, it is application specific.