Reliability Centred Maintenance
VINOTH KUMAR SUBRAMANI
| Reliability experts | Asset Management Strategist | Champion of Operational Efficiency, Maintenance & System Integrity | Solutions for Asset Performance | Predictive Analytics |
Reliability centred maintenance (RCM) is a method to identify and select failure management policies to efficiently and effectively achieve the required safety, availability and economy of operation. Failure management policies can include maintenance activities, operational changes, design modifications or other actions in order to mitigate the consequences of failure.RCM was initially developed for the commercial aviation industry in the late 1960s, resulting in the publication of ATA-MGS-3. RCM is now a proven and accepted methodology used in a wide range of industries.RCM provides a decision process to identify applicable and effective preventive maintenance requirements, or management actions, for equipment in accordance with the safety, operational and economic consequences of identifiable failures, and the degradation mechanism responsible for those failures. The end result of working through the process is a judgement as to the necessity of performing a maintenance task, design change or other alternatives to effect improvements.
The basic steps of an RCM programme are as follows:
a) initiation and planning;
b) functional failure analysis;
c) task selection;
d) implementation;
e) continuous improvement
All tasks are based on safety in respect of personnel and environment, and on operational or economic concerns. However, it should be noted that the criteria considered will depend on the nature of the product and its application. For example, a production process will be required to be economically viable, and may be sensitive to strict environmental considerations, whereas an item of defence equipment should be operationally successful, but may have less stringent safety, economic and environmental criteria.
Maximum benefit can be obtained from an RCM analysis if it is conducted at the design stage, so that feedback from the analysis can influence design. However, RCM is also worthwhile during the operation and maintenance phase to improve existing maintenance tasks, make necessary modifications or other alternatives.
Successful application of RCM requires a good understanding of the equipment and structure, as well as the operational environment, operating context and the associated systems, together with the possible failures and their consequences. Greatest benefit can be achieved through targeting of the analysis to where failures would have serious safety, environmental, economic or operational effects.
RELIABILITY-CENTERED MAINTENANCE (RCM)
Any RCM process shall ensure that all of the following steps are performed in the sequence shown:
Determine the operational context and the functions and associated desired standards of performance of the asset
(operational context and functions).
Determine how an asset can fail to fulfill its functions (functional failures).
Determine the causes of each functional failure (failure modes).
Determine what happens when each failure occurs (failure effects).
Classify the consequences of failure (failure consequences).
Determine what should be performed to predict or prevent each failure (tasks and task intervals).
Determine if other failure management strategies may be more effective (one-time changes).
Operating Context
“The operating context of the asset shall be defined.”
The functions, failure modes, failure consequences, and failure management policies that will be applied to any asset will depend not only on what the asset is, but also on the exact circumstances under which it is to be used. As a result, these circumstances need to be clearly defined before attempting to answer the question quoted above. An operating context statement for a physical asset typically includes a brief overall description of how it is to be used, where it is to be used, overall performance criteria governing issues such as output, throughput, safety, environmental integrity, and so on. Specific issues that should be documented in the operating context statement include:
Batch versus flow processes: whether the asset is operating in a batch (or intermittent) process or a flow (or continuous) process.
Quality standards: overall quality or customer service expectations, in terms of issues such as overall scrap rates, customer satisfaction measurements (such as on-time performance expectations in transportation systems, or rates of warranty claims for manufactured goods), or military preparedness.
Environmental standards: what organizational, regional, national, and international environmental standards (if any) apply to the asset.
Safety standards: whether any predetermined safety expectations (in terms of overall injury and/or fatality rates) apply to the asset.
Theater of operations: characteristics of the location in which equipment is to be operated (arctic versus tropical, desert versus jungle, onshore versus offshore, proximity to sources of supply of parts and/or labor, etc.).
Intensity of operations: in the case of manufacturing and mining, whether the process of which the equipment forms a part is to operate 24 hours per day, seven days per week, or at lower intensity. In the case of utilities, whether the equipment operates under peak load or base load conditions. In the case of military equipment, whether the failure management policies are designed for peacetime or wartime operations.
Redundancy: whether any redundant or standby capability exists, and if so what form it takes.
Work-in-process: the extent to which work-in-process stocks (if any) allow the equipment to stop without affecting total output or throughput.
Spares: whether any decisions have been made about the stocking of key spares that might impinge on the subsequent selection of failure management policies.
Market demand/raw material supply: whether cyclic fluctuations in market demand and/or the supply of raw materials are likely to impinge on the subsequent selection of failure management policies. (Such fluctuations may occur over the course of a day in the case of an urban transport business, or over the course of a year in the case of a power station, an amusement park, or a food processing business.)
In the case of very large or very complex systems, it might be sensible to structure the operating context in a hierarchical fashion, if necessary starting with the mission statement of the entire organization that is using the asset.
Functions can be divided into two categories: primary and secondary functions.
Primary Functions
The reason why any organization acquires any asset or system is to fulfill a specific function or functions. These are known as primary functions of the asset. For instance, the main reason why someone acquires a car may be “to transport up to five people at speeds up to 90 km an hour along suitable roads.”
Secondary Functions
Most assets are expected to perform other functions, in addition to the primary functions. These are known as their secondary functions. Secondary functions are usually less obvious than primary functions. But the loss of a secondary function can still have serious consequences, sometimes more serious than the loss of a primary function. As a result, secondary functions often need as much if not more attention than primary functions, so they too must be clearly identified.
When identifying secondary functions, care should be taken not to overlook the following:
Environmental integrity
Safety/structural integrity
Control/containment/comfort
Appearance
Protective devices and systems
Economy/efficiency
Superfluous
Failure Modes :
Identifying Failure Modes
“All failure modes reasonably likely to cause each functional failure shall be identified.”
Prominent Failure Modes of Motor:
Motor stator winding insulation failure due to phase unbalance
Motor stator winding short circuit due to insulation failure
Motor failure due to rotor & bearing
FAILURE EFFECTS
“Failure effects shall describe what would happen if no specific task is done to anticipate, prevent or detect the failure”.
FAILURE CONSEQUENCE CATEGORIES
“The consequences of every failure mode shall be formally categorized...”
After each reasonably likely failure mode and its effects have been identified at an appropriate level of detail, the next step in the RCM process is to assess the consequences of each failure mode. The primary source of information used to assess failure consequences is the description of the failure effects.
Some failure modes affect output, product quality or customer service. Others threaten safety or the environment. Some increase operating costs, for instance by increasing energy consumption, while a few have an impact in four, five or even all six of these areas. Still others may appear to have no effect at all if they occur on their own, but may expose the organization to the risk of much more serious failure modes.
If any of these failure modes are not anticipated or prevented, the time and effort that need to be spent correcting them also affects the organization, because repairing them consumes resources that might be better used elsewhere.
The nature and severity of these effects govern the way in which each failure mode is viewed by the organization. The precise impact in each case—in other words, the extent to which each failure mode matters—depends on the operating context of the asset, the performance standards that apply to each function, and the physical effects of each failure mode.
This combination of context, standards and effects means that every failure mode has a specific set of consequences associated with it. If the consequences are very serious, then considerable efforts will be made to prevent the failure mode, or at least to anticipate it in time to reduce or eliminate the consequences. On the other hand, if the failure mode only has minor consequences, it is possible that no proactive action will be taken and the failure mode will simply be corrected each time it occurs.
This means that the consequences of failure modes are more important than their technical characteristics. It also suggests that the whole idea of failure management is not so much about anticipating or preventing failure modes and it is about avoiding or reducing their consequences.
The remainder of this section considers the criteria used to evaluate the consequences of failure modes, and hence to decide whether any form of failure management is worth doing. These consequences are divided into four categories in two stages. The first stage separates hidden failures from evident failures.
Hidden and Evident Failures.
Some failure modes occur in such a way that nobody knows that the item is in a failed state unless, or until, some other failure (or abnormal event) also occurs. These are known as hidden failures. A hidden failure is a failure mode whose effects do not become apparent to the operating crew under normal circumstances if the failure mode occurs on its own.
Conversely, an evident failure is a failure mode whose effects become apparent to the operating crew under normal circumstances if the failure mode occurs on its own.
The RCM approach to the evaluation of failure consequences begins by separating hidden failures from evident failures Hidden failures can account for up to half the failure modes that could affect modern, complex equipment, so they need to be handled with special care. The following paragraphs explain the relationship between hidden failures and protection, and introduce the concept of a “multiple failure.”
FAILURE MANAGEMENT POLICIES—SCHEDULED TASKS
The next level within the RCM decision process assesses the characteristics of each failure mode to determine the most appropriate failure management policy. There are a number of options available; namely:
a) Condition monitoring
Condition monitoring is a continuous or periodic task to evaluate the condition of an item in operation against pre-set parameters in order to monitor its deterioration. It may consist of inspection tasks, which are an examination of an item against a specific standard.
b) Scheduled restoration
Restoration is the work necessary to return the item to a specific standard. Since restoration may vary from cleaning to the replacement of multiple parts, the scope of each assigned restoration task has to be specified.
c) Scheduled replacement
Scheduled replacement is the removal from service of an item at a specified life limit and replacement by an item meeting all the required performance standards. Scheduled replacement tasks are normally applied to so-called “single-cell parts” such as cartridges, canisters, cylinders, turbine disks, safe-life structural members, etc.
d) Failure-finding
A failure-finding task is a task to determine whether or not an item is able to fulfill its intended function. It is solely intended to reveal hidden failures. A failure-finding task may vary from a visual check to a quantitative evaluation against a specific performance standard. Some applications restrict the ability to conduct a complete functional test. In such cases, a partial functional test may be applicable.
No preventive maintenance
It may be that no task is required in some situations, depending on the effect of failure. The result of this failure management policy is corrective maintenance or no maintenance at all, following a failure.
f) Alternative actions
Alternative actions can result from the application of the RCM decision process, including:
i) redesign;
ii) modifications to existing equipment, such as more reliable components;
iii) operating procedure changes/restrictions;
iv) maintenance procedure changes;
v) pre-use or after-use checks;
vi) modification of the spare supply strategy;
vii) additional operator or maintainer training.
PART 2....!!! Rest will be continued...!!
Mega projects HSSE/PSM SME and Process Engineer/ex-PETRONAS/PDO (TA-2) SHELL/KOC/SABIC/Exxon Chemicals/petrochemicals/oil and gas/ammonia-urea/combustible dust/HAZOP/Bowties/SCE Performance Standards/OSHA/NFPA/API/EPA
5 年Many organizations use RCM as a tool to do as little as possible than more in order to achieve the perceived reliability assurance. As a result, the scope of RCM is entirely subjective and totally dependent on the expertise, outlook and motivation of the personnel who carry out the RCM risk assessment which by definition is also subjective. As a result, events such as elbow failure due to non-inspection that led to Philadelphia Energy Solutions refining complex explosion earlier take place. I personally investigated a refinery fire caused by the failure of a misplaced carbon steel piping spool in an alloy piping system of a sulfadation process which failed after 3 years because it was not inspected since the alloy piping had an inspection frequency of 10 years.
Partnering with customers in their Digital Transformation & Sustainability Journey
5 年Really informative Vinoth! Look forward to learn more about RCM.
Support solutions engineer and guide
5 年All good stuff Vinoth. ‘Most’ programmes (I put most in inverted commas because anybody that has a point to make thinks that their point causes most problems) fall down because that continuous improvement (step 5 in your diagram) is reduced to continual or not at all. As an example, most of the RCM activity that I have taken part in has been on aircraft platforms in the UK Defence sector. UK Defence (Air) has a policy of using RCM methodologies and, I’m paraphrasing but the intent is along the lines of, ‘the maintenance schedule should be reviewed periodically. The period should be no greater than 5 years’. The fact that there is a policy and that that policy has a time limit means two things. 1. RCM is now mandatory which means that any leadership and good intent is diluted down to being a tick box exercise because it’s something that an organisation has to do rather than wants to do. 2. There is a time limit - a target - and any hard-pressed-for-cash organisation will aim to get as close to that target as possible without action, so as to maximise the benefit of the cost. In practice what that creates is a culture where nobody buys in to reliability-centred maintenance, the engineers don’t create quality data because they don’t believe the organisation will look at it. The organisation wants to look at it but, by the time they’ve reached their time limit, the job is simply too expensive to do properly, so they can’t look at it. It’s a downward spiral from there and a self-fulfilling prophecy of failure. Actually, in this 21st century with the computing power and the excitement of maintenance 4.0 there is hope. But now we’re trying to get an organisations that failed to invest in their culture (and therefore it’s people’s role in RCM) to invest in more equipment that should improve and provide - if applied appropriately - the feedback data that the continuous improvement process is screaming out for.
Asset Performance Management | CMRP | Program Manager - Asia Pacific
5 年I have often found "defining failure modes" as the most crucial step in this. While its very easy to say "Dominant failure Modes", we tend to over analyse the same adding secondary failures into the picture.?
well said