Operating Redundant Systems

Operating Redundant Systems

Introduction:

There are two schools of thought (scenarios) when it comes to the operation of redundant systems:

1) Operate Asset A as much as possible, and if/when it Asset A fails switch to Asset B, and schedule a replacement for Asset A.

2) Cycle operation as evenly as possible between Asset A and Asset B then replace them both at the end of life (typically in a planned shutdown).

TLDR: Scenario 1 is preferred in most cases (See exceptions).

Assumptions / Setup:

For the purposes of this article we're going to assume a duel redundant system where Asset A is identical to Asset B.

Modelling 2 Assets in Parallel

Most assets have a combination of failure modes tied to either calendar time (ageing) or operating time (wear out). Since the failure modes of Asset A and Asset B are identical and changes in operation only affect the modes tied to operational time, we will only compare the operational time modes.

We're also going to assume that if both assets exist in a failed state, then this is a catastrophic event.

Reasoning:

For Scenario 1

Reliability of 2 Parallel Assets

If the estimates are conservative, then there are no premature replacements. This maximises "useful life", and decreases the cost / life ratio of each asset. If the estimates are inaccurate, when asset A fails, the chance that the redundant system is also in poor condition is low. This means there is a low of a catastrophic event.

For Scenario 2, the rate of decay is halved, since operation is shared between both assets as evenly as possible. In this scenario there are 2 asset failures over the same period as scenario 1, however these are both expected to occur at a similar time.

Scenario 2 redundant systems

The idea is that you can replace both assets at the "End of life" proactively during a shutdown. If the estimates are conservative, then it is likely that the premature replacement is wasting "useful life", as well as increasing the cost / life ratio of each asset. If the estimates are inaccurate, when asset A fails, the chance that the redundant system is also in poor condition is also high. This means there is a high risk of a catastrophic event.

Conclusion:

Since both scenarios have the same effective cost (Assuming Scenario 2 is replaced optimally), Scenario 1 reduces the risk of catastrophic events because the redundant system is in the best possible state at the time of the first failure.

Exceptions to this recommendation:

1) If lack of operation increases the likelihood of other modes: e.g. Lack rotation causes static corrosion due to uneven distribution of grease. Note: The presence of a mode like this does not invalidate the above conclusion, and the goal should still be to minimize the cost/life ratio and risk. This means that periodic rotation or mitigating task for this mode may be effective, whilst still adhering to Scenario 1's operation.

2) If unplanned switching losses are disproportionately high.

Maxwell Kazuva

Helping out @ Carrapateena

4 年

Scenario 1 is usually better until Asset B fails before a replacement for Asset A is ready for deployment because of external factors such as Covid 19.

回复
Mike Hobbs MIAM, SaRS

Reliability Leadership - Asset Management, RAM(S) Engineering, Maintenance, RCM / FMECA, ERP/EAM, Reliability, FTA, RCFA. KTP Supervision.

4 年

?Both options assume that no defects are introduced during the passive standby phase such as stiction, binding, false brinelling (as noted), lube oil settle out etc. As mentioned above the loss of the dormant standby is a hidden functional failure hence the need to test a start on demand, but additionally that operation also provides a useful maintenance role in getting everything moving again and redistribute lubricants etc. That said, a while back I came across a case where the unofficial strategy was as follows – upon failure of the duty unit the standby unit was operated, this they then believed gave them plenty of time to investigate the repair, strip down, discover difficulty in part identification, find parts were not spared, difficult to order, slow through the system and then slow to be installed. This meant that for significant periods they were operating 1oo1. This creates a significant window of opportunity for concurrent failure with ensuing production loss. The point is - Whatever strategy you adopt when one units fails get it serviceable again as quick as possible! whether it’s returned to duty or becomes the standby. Get it fixed, having the standby doesn’t give you unlimited time to repair at your leisure.

回复
Farhat Khan CRE, CMRP, CSSGB, CRL, RBI

Maintenance Reliability Engineer

4 年

The swapping strategy for the systems with redundant equipment should be carefully selected as it directly impacts the availability. The consequences due to failure of any of those assets should be kept in mind for this decision. If it's a critical system, keeping the redundant asset B idle till the asset A fails may not be the best choice as some hidden failure modes like false brinelling on the bearings, rotor bend, corrosion inside the casing etc could lead to asset B's failure when needed on demand and the consequences would be huge downtime. On the other hand, too many starts and stops of both the assets in 50:50 philosophy could lead to unnecessary stresses on the components causing the premature failure. The swapping philosophy and the swapping cycle duration should be aligned with the predictive/preventative maintenance plans so no PPMs are missed due to equipment not ready for it.

Alex Pavlickovski

Strategy, Asset Management, Digital Engineering, IoT, Industry 4.0

4 年

Or Three, operate Asset A at 30% more than Asset B and stagger your replacement costs and allow for both assets to be replaced in a planned manner, with Asset A being your leading indicator for asset performance.

The purpose of redundancy is to achieve the highest possible availability of the system but the threat of common cause failures always prevail along with redundancy. Here your assumption is missing the presence of hidden failure of Asset B. You can determine hidden failure by switching operation to redundant equipment to see if it is able to operate on demand. But the problem is that just switching would lead to heavy load on equipment, reducing its life. So it is advisable to operate for a long time when you have already switched rather than turning it off. Your assumption in case B is right that we cannot operate equally as it would lead to simultaneous failure ( not due to common cause). So the best strategy is to operate both 30/70 so that we optimize operation under both concerns.

要查看或添加评论,请登录

Dane Boers的更多文章

  • Justifying an Asset Management Software solution. The Technical and Strategic views.

    Justifying an Asset Management Software solution. The Technical and Strategic views.

    Modla recently put forward a proposal to an electricity distribution utility, to justify an investment in our data to…

    2 条评论
  • Asset Analytics

    Asset Analytics

    Asset Analytics Asset analytics is encompasses many approaches including: predictive, prescriptive, optimisation…

  • Asset Modelling and Knowledge Capture

    Asset Modelling and Knowledge Capture

    Asset Modelling is the structured use of asset analytics, reliability engineering and existing knowledge, to inform…

  • Intro to Asset Modelling

    Intro to Asset Modelling

    Introduction Asset Modelling is the end to end analysis process for taking asset data through to decision making. It is…

社区洞察

其他会员也浏览了