Soft-Fail versus Conventional Redundancy

Soft-Fail versus Conventional Redundancy

Redundancy, in the context of earth station electronics, refers to subsystems that are designed to limit service-interruption to a matter of milliseconds (the time it takes for a switch to move from position-one to position-two) following the catastrophic failure of a major component (a 'component' being any significant device, such as a power amplifier, RF converter, LNA/B, modem etc.) These subsystems come in many forms that are highly application-dependent.

The need for redundancy is a given for (dare I say it) 'mission critical applications' (one of the most overused terms in our industry). There are certainly instances where an interruption of service can cause great harm, like a life threatening surgery being performed remotely via a satellite link or more importantly, the seamless transmission of a major sporting event.

Redundancy subsystems are comprised of a controller that monitors the health of the components in that subsystem and, upon detection of the failure of an online component, will send commands to a switching system that will reroute the signal to an identical standby that is sitting in a quiescent state waiting to be called into service.

This architecture is what I refer to as 'conventional redundancy'. It's a concept that has been in play since the beginning of the industry for instances where a high level of availability is desired. It is a very simple approach, though not necessarily the best approach for all cases. But we'll get more into that later.

When it comes to earth station components commonly used to transmit and receive satcom links, we typically think in terms of 'single pol' and 'dual pol', which refers to the two orthogonally separated polarization fields (except for circular polarized satellites, but let's not go there) that exist between the ground station and the satellite. If the antenna feed is configured for dual pol access (full frequency reuse), four discrete ports are available - two transmit and two receive.

Now for some simple definitions. One online component with a dedicated backup is referred to as a 'one-for-one' system (expressed as 1:1). A dual pol system whereby one backup is shared between the two online components is a 'one-for-two' system (expressed as 1:2).

A few down sides of conventional redundancy are that the dedicated backup can't be used to carry additional services and still serve as a backup (without the addition of some priority switching logic - and that can get messy). A dedicated backup is okay if the components are relatively inexpensive (cheap LNBs or low-power amplifiers). But if you're talking about more expensive products, like very high-power HPAs, you inherently tie up a lot of capital. And if the backup component sits in hot-standby (usually the case), it's aging along with the online component, while serving no benefit until there's a failure.

That pretty well covers the main elements of conventional redundancy, but the conversation wouldn't be complete without mentioning 'phase-combining' and its impact on redundancy architecture. It should be noted that solid state power amplifiers (the crème de la crème of power amplification) depend on Field-Effect Transistors (FETs) to generate RF power.

FETs come in all shapes and sizes with power levels that range from a few watts to somewhere around 130 watts at some frequencies. In order to generate respectable power levels, they must be cascaded, or phase-combined such that their individual merits can be summed. Phase-combining can be performed inside the amplifier with a break point of around 1kW or so at some frequencies. Beyond that, physical size and weight become prohibitively impractical.

For higher power systems, one can choose to either externally phase combine these larger amplifiers, which carries with it high service-return freight costs due to size, and the high capital costs of backup and spare amplifiers - or one can choose to distribute the load over a larger number of smaller amplifiers. When it comes to redundancy in high-power systems, an alternative approach to consider is a modular system based on 'Soft-Fail Redundancy'.

Operationally, a soft-fail system isn't so different from a system that uses conventional redundancy. But behind the curtain, soft-fail systems are considerably more sophisticated, carry a host of additional benefits and cost savings that might not be readily apparent at a casual glance. But we'll get into the intimate details later.

When considering the purchase of high-power amplifier systems, two important metrics include - 'Mean-Time-Between-Failures' (MTBF) and 'Mean-Time-To-Repair' (MTTR). Important, because together they determine the 'Availability' of the system - the total number of hours it will be usable over its projected lifespan - in simpler terms, ROI.

Where MTBF is more or less a reflection of a product's design quality, like how well it's able to extract heat from the transistors under high ambient conditions, MTTR is more of a reflection of how quick and efficiently a failed component can be removed from the system, fixed and reinstalled. In other words, how long will the system be down if a component fails.

The steps include removal, packing, freight-time back to the factory, Customs-clearance, repair-time, test-time, repacking, freight-time back to the site, Customs-clearance and re-installation. For systems that employ conventional redundancy, the MTTR can be greatly reduced if there is a spare component at the site that can be placed into service while the failed component is off being repaired.

In this case, a lower MTTR can come at a significant expense, particularly for high-power systems. You now have two high-dollar components sitting in stasis - the offline backup plus the shelf spare. This could equate to hundreds of thousands of dollars in exchange for peace of mind (or job security). Another option is to employ soft-fail redundancy.

Back in the early 2000's, Maxtech, a manufacturer of solid state satcom power amplifiers, introduced a radical new concept in high power amplifier (HPA) redundancy, along with two new terms - soft-fail and hot-swap. The product was badged 'Modumax' and consisted of a rack mount chassis with eight power modules phase-combined to produce 1kW of Psat power in C-band.

It was eventually expanded into other frequency bands and power levels, but the cool thing about it was that the failure of a single module resulted in a maximum power loss of 1.2dB. And if sufficient power was held in back-off, the output of the remaining seven modules would automatically be increased to compensate for that loss (no mechanical switching required) - so the total RF output of the amplifier would remain constant (soft-fail).

And the beauty was that the failed module could be removed and replaced while the amplifier was in service (hot swap). As a result, the expense of sparing was reduced to the cost of a single module and (perhaps a power supply module that was also hot swap capable). In that scenario, MTTR went down from weeks or months to minutes or hours.

I was responsible for sales at the systems integration facility of VertexRSI at the time Modumax was introduced and market acceptance was tepid at best. The comfort zone of the industry was centered around legacy, conventional redundancy (welcome to satcom). It took a while, but following a few successes, soft-fail became a staple of the industry for high power applications.

Modumax was only available in a rack mount configuration at a time when high power amplifier systems were moving out to the antenna to manage the RF insertion loss associated with long waveguide runs, reduce utility costs and eliminate the need for RF equipment shelters. Regardless, Modumax was a very successful product and remains so to this day.

A decade would pass before competing soft-fail systems would come to market when Paradise Datacom introduced 'PowerMAX' and Advantech launched 'Summit'. In both cases, the RF modules were complete amplifiers that had the capacity to generate significantly higher levels of RF output power and virtually eliminated single points of failure.

When Advantech Wireless brought the second and third generations of Summit (Summit II in 2019 and Summit III in 2022), great care was taken to exploit the benefits of soft-fail system architecture, including individual amplifier/modules, each capable of generating up to 1kW of RF power and the introduction of CAN-Bus as an operating platform, due to it's high processing speed and component-level diagnostics capability.

16 x 1kW Summit II
8 x 250W Summit III

When CAN-Bus became integral to each amplifier in the system, the need for outboard controllers was eliminated thus allowing any amplifier in the system to take over as the master in the event of a module failure.

In soft-fail, unlike in a conventional redundancy system, all of the amplifiers are in service, sharing the load and with the health of each (down to the device level) being constantly monitored and reported to the master in real time.

Since the total system output power can be distributed over a larger number of smaller amplifiers, the loss from a single amplifier failure is reduced (0.6dB for a 16 amplifier system). The amplifiers are smaller and easier to handle, are less expensive to spare and the return freight costs for service are much lower.

Back to the MTBF and MTTR metrics. In soft-fail systems, switching is not required to facilitate redundancy, which increases the MTBF, and at no point is the signal severed during the backup process.

Based on Telcordia standards, an eight or sixteen module soft-fail system can deliver close to a million hours of availability. Advantech has earlier Summit systems that have delivered 100% availability for over twelve consecutive years.

As is always the case, one size doesn't fit all. Both conventional and soft-fail redundancy platforms have their respective 'perfect fit' scenarios, but it's great to know that a few new options are available for operators around the planet to ponder.



Gary Springer

Retired in the Mountains

1 年

I think we were pretty successful at Scientific-Atlanta designing redundant systems. I remember Von Braun's statement "our goal is to make the target more dangerous than the launch area". I remember thinking that our goal was to make the redundant system more reliable than the non-redundant system. That could be difficult when software was involved. I think I only caused a failure once. We had developed a PC-based redundancy controller for klystron amplifiers. As usual we didn't have enough equipment in the lab to completely test the system. We were in Morrison, CO and I explained to the chief engineer that we had tested with simulators and I was confident it would work. He said OK let's try it. I think he simulated a waveguide arc and our controller did its thing. I watched the logs and everything seemed perfect. But the TV monitors went black. And then it seemed in a few seconds every telephone in the facility started ringing. He quickly switched everything back online.It turned out the amplifiers were numbered non-consecutively and the video signals were wired to the video switch that way. After correcting that we were good. Very stressful.

回复
Paul Sandoval

Vice-President of Sales, Americas at Integrasys-SA/Technology Sales/Sales Management/Forging Alliances/US & Foreign Government-Relations/Solutionist

1 年

You do not need to convince me Tony, hope you are doing great

要查看或添加评论,请登录

Tony Radford的更多文章

  • GEO at its finest!

    GEO at its finest!

    In preparation for the upcoming hurricane season, Advantech Wireless Technologies has received a follow on award from…

  • The Transformation of N68XU

    The Transformation of N68XU

    When I turned 60, I got this wild hair to learn how to fly helicopters. So one Saturday, I dropped by Blue Ridge…

    13 条评论
  • Satcom Amplifiers - Comparison - SSPAs Vs TWTAs

    Satcom Amplifiers - Comparison - SSPAs Vs TWTAs

    Introduction Since their debut back in 1948, microwave RF amplifiers based on solid state circuitry, have served as…

    2 条评论
  • A Retrospective view of a Satcom Career

    A Retrospective view of a Satcom Career

    Me and a longtime friend of mine got into a deep conversation recently about our earlier decisions to embark on a…

    45 条评论
  • A fresh new approach to Amplifiers for Satcom Transmitters

    A fresh new approach to Amplifiers for Satcom Transmitters

    This is my latest submission for the December issue of Satellite Evolution Magazine (less the zinger that's hidden in…

    1 条评论
  • Is Big LEO sustainable as a business?

    Is Big LEO sustainable as a business?

    Someone generated a LinkedIn post earlier citing the loss of a rather large number of LEO satellites (in this case…

    4 条评论
  • The Art of Selling

    The Art of Selling

    Last April (2023) marked my 43rd consecutive year serving the satellite communications industry. I came in on the…

    6 条评论
  • ARTEMIS - Lunar Exploration & Colonization

    ARTEMIS - Lunar Exploration & Colonization

    Advantech Wireless Technologies has been awarded the opportunity to provide amplifier systems to support Artemis…

    7 条评论
  • US NAVY BOOT CAMP - 1971

    US NAVY BOOT CAMP - 1971

    A recent LinkedIn post stated that over 70% of today's youth fail physically and/or mentally to meet the minimum…

    4 条评论
  • The true story of Rudolph the Red- Nosed Reindeer.

    The true story of Rudolph the Red- Nosed Reindeer.

    And to think it all started at a company Christmas party. As the holiday season of 1938 came to Chicago, Bob May wasn’t…

    1 条评论

社区洞察

其他会员也浏览了