7 blogs, 7 days - Apply a risk based approach to justify upgrades

7 blogs, 7 days - Apply a risk based approach to justify upgrades

Not if, but when.

The most obvious reason for a safety system upgrade would be if the plant was experiencing multiple unexpected outages or trips in a year – then the justification would be easy! However, just because it hasn’t failed doesn’t mean that it won’t fail – it’s all a question of probability / likelihood, or “When, not If”. If you wait until something fails, then it becomes urgent, and vital resources need to be diverted from existing projects or budgets to resolve the issue. This often results in higher overall costs such as compressed schedule, paying a premium to expedite spares / repairs, getting the knowledge and expertise required at a moment’s notice.

The biggest impact of any unforced outage is the impact of business interruption (i.e. the consequence). The entire production planning and scheduling can be thrown of balance, supply contract commitments can be placed in jeopardy, off specification product may be produced that has to be written off, production targets are missed, the list continues.

Let the risk matrix be your friend

I have always struggled how to provide a simple visual of the many factors associated with modernizing any safety system. One method that I have used often with good success is the Risk Matrix, as it is a common denominator across all operating companies (although many use different sizes, 4x4, 5x5, 8x8) but they all know and understand the matrix. So, if you ignore the math behind the matrix (its hard to calculate the Likelihood, but the consequences remain the same!) then the risk matrix provides a simple, easy to understand, common view around which to have the discussion e.g.

No alt text provided for this image

One other approach is to justify the upgrade in terms of Risk Reduction achieved per $1 spent:

No alt text provided for this image

In other word’s if I compare competing project funding requests, then I am more likely to be successful if I can prove / demonstrate that the upgrade project provides greater risk reduction to the business for each dollar spent than other competing projects.

TIP: It may be worth factoring in “What if” scenarios into the ROI calculation to demonstrate the business consequence

If such an event has actually happened, then real numbers can be ascertained and used to support the ROI. If not, then some “what if” scenarios with prove useful in raising awareness and showing the cascade effect it has on the business. To most organizations, customer satisfaction is a key metric, and often has the management’s attention. Anything that has the potential to impact that metric often gets attention!

Without proper planning, you may miss the opportunity to take advantage of improved functionality with the new safety system that may lead to higher ongoing cost of ownership.

Parts Availability or obsolescence

Obsolescence is not reason to upgrade a safety system, but the consequence is! Product obsolescence is a fact of life for any electronics manufacturer. Equipment, controllers, communications, power supplies, I/O cards, workstations get old, certain components are no longer available, it becomes impossible to repair or replace items.

There are several ways in which this situation can be managed:

  • Hold adequate spares and replacements locally on site
  • Work with the automation vendor to provide a central “bonded” stock for the sole use of the operating company
  • Scour the commercial market for spares
  • Plan an upgrade path

Whilst holding inventory locally on site is often preferred, it does lead to additional inventory cost on the books of the business, often means duplication of inventory across multiple units, sites or assets and needs careful management for various hardware and firmware revisions. Holding a central stock can also create additional issues when operating across multiple countries, regions or geographies such as taxes and duties, import / export regulations, time to get the stock from the central location to the operating site.

CAUTION: In recent years there has been a steady increase in the rise of providers offering used / spares / refurbished modules. This source should be treated with extreme caution.  Craigslist and eBay are unreliable suppliers for manufacturing facilities that operate 24 hours a day, 7 days a week.

For any electronic or programmable system, the devil is often in the detail. The specific compatibility of hardware, software and firmware revisions are critical to the integrity and operation of the safety system. At the end of the day, would you trust the safety of your people, production and profit on an “internet purchase” from an unknown source of supply?

The preferred option is to plan an upgrade path to prolong the operating life of the safety system. Upgrades are often “gradual” and parts of the system are upgraded as / when the time or opportunity presents itself. Key to the success of this approach is to ensure the Interoperability of the different versions of systems. As systems are upgraded, then this approach creates “spares” that can be used to support the other legacy systems until they can be upgraded.

Once you start mixing components of different versions, then it generally becomes more complex to manage and maintain the various systems, so maintenance costs may increase. The key is to get all the systems to a common revision level or even better into a position where by they can be upgraded online without halting operations.

TIP: When considering the ROI calculation, it is worth including any rising costs of obsolete spares and rising support costs.  

If there are any planned expansions, then the it may be worth including the delta between the increased costs of adding points to the existing safety systems versus using a new system. Ensure that the cost per I/O point includes the additional cost of specialist knowledge and expertise required to modify an aging system.

TIP: Not only is the availability of parts a potential concern, don’t overlook the lead time of getting critical / urgent spares.  

Every day or week that it takes to deliver a spare part can have a significant consequence on the overall schedule or budget, often many times greater than the actual cost of the replacement part itself.

However, with any of these options, it does not resolve the fact that the equipment has reached the end of its useful lifetime, presents a business risk and should be planned for accordingly.

Protect against emerging cybersecurity threats

Any modernization plans must include a solid security program. Not only is this now an integral part of the latest industry standards (e,g, IEC61511 Edition 2 now mandates cyber security risk assessments) but many organizations now include company standards to mitigating / manage risks of cyber-attack.

Many older machines using operating systems that are no longer supported present a security risk because they are susceptible to virus and cannot be made secure. Loss of these machines can be critical if they are not available when called upon, leading to potential downtime. As part of the business justification, it is worth capturing the risk and cost of potential downtime due to the inability to access the relevant engineering and maintenance machine(s).

Beware the ‘Custom Special’

In legacy safety systems, the programming tools available at the time often didn’t support templates or comprehensive function block libraries which made it difficult to implement standardization across multiple systems, sites or applications. This led to complex code, understood by the very few, and often unsupportable and unmaintainable. Many of the latest programming tools support standardization, allowing knowledge and best practice to be captured and encapsulated reducing customization, simplifying trouble shooting for both engineers and maintenance technicians, which can reduce downtime.

This is difficult to quantify for an ROI but may be worth including realistic examples or case studies applicable to your operation as part of the supporting documentation.

Rip or Replace?

Very often the first question I often get asked is “Can we upgrade what we already have, or do we need to rip and replace?” For me, the biggest risk is the physical space, especially the equipment cabinets – replacing, removing, decommissioning existing cabinets, then replacing with new. 

Start with the footprint and space available as this often dictates if rip and replace is even possible.

No alt text provided for this image
No alt text provided for this image

At some point when rip and replace you will get to the point of no return after which there will be no turning back! Don’t forget the impact on people e.g. the news skills and competencies required, additional training required, any new training systems or simulators etc. And finally, don’t forget to include the impact on existing support contract(s).

If you decide to upgrade existing, make you sure that you remember to update your spares holding to ensure compatibility with the old / new versions of the upgraded system.

When comparing the 2 choices, it is a good idea to estimate the activity required ((Time + Resources + Cost) * Risk) e.g.

No alt text provided for this image

Consider the system architecture

If you do decide to go down the new route of replacing old with new, take a moment to stop and consider what your future needs are, and what you need from your safety system. The good news is that there are now more choices available on the market – the bad news is that there are more choices available and you need to decide what is best for you. For example, new network architectures and communications protocols mean you can architect your safety system by functional plant unit, put I/O in the field in predesigned field enclosures, use Universal I/O to reduce the quantity of spares required, accommodate late changes, install early and then configure later etc.

In general, ensure that you fully understand what you are buying, and what your obligation is for the operating life of the asset (by lowest CAPEX doesn’t necessarily means lowest OPEX cost). For example:

  • Ensure you understand the Total Cost of Ownership for the remaining operating life of the plant
  • Understand how often you must proof test the system, calibrate the system, etc.
  • Understand what diagnostics are automatic / built into the system versus what must be configured in application logic
  • How the system redundancy works (voted / adaptive fault management versus failover redundancy)
  • If online changes or modifications can be made to the safety system without halting operations? Are there any manual precautions required during download changes?
  • Understand failure modes e.g. what happens to DCS communications during high CPU Loading / extended Scan time? 
  • How many faults can the system tolerate before shutting down?
  • Is any application logic required to replace I/O modules?

Summary

In the next blog we will take a closer look at ……

要查看或添加评论,请登录

Steve Elliott的更多文章

社区洞察