Never waste a good crisis - learnings from the Callide event of 25 May 2021
On 25 May 2021, a failure of a generator at the Callide power station triggered a series of events that led to a significant volume of load shedding in QLD and NSW.?Following restoration of this load, Queensland then experienced a period of high prices, as well as activation of the Reliability and Emergency Reserve Trader.
These kinds of severe events happen infrequently on the power system. They can have material impacts for customers and bring with them significant safety risks for power system workers; these impacts should always be front of mind. However, once the dust has settled, as an industry its critical we take stock and understand what can be learned from the event.
A few things I think can be learned from the event include the following.
Firstly, changes made to the system to reinstate primary frequency control (PFR) across the bulk of the generation fleet appear to have helped the power system recovery from the initial event. PFR refers to the rapid and automatic response of generators to the changes in system frequency that can occur after a major event. In this event, the presence of PFR appears to have helped stabilise the response of emergency control mechanisms, to quickly return the frequency to normal levels, allowing QLD to be rapidly reconnected to the rest of the power system.
Secondly, the event highlighted the importance of protection equipment to prevent and manage major events, and to protect individual generators from further damage. Protection equipment isolates failures in the power system, to stop them spreading further and causing bigger disturbances. Protection equipment issues led to the triggering event, as well as the loss of multiple other units in what is known as a ‘cascading failure’. Some of these protections actually behaved in ways that were not expected.
Finally, large scale inverter based generation was curtailed following the cascading failure. This was done to ensure that remaining levels of ‘system strength’ were sufficient to keep the system stable. This kind of removal of generation capacity from the market can contribute to ongoing supply scarcity following a major disturbance, increasing prices for consumers. It highlights the importance of separating the provision of system strength from provision of energy from synchronous units.
This note picks the eyes out of the AEMO report, which itself runs to 84 pages and contains a lot more information. Serious nerds should look to that paper, which is available here.
A series of unfortunate events
The event on 25 May 2021 really consists of two major components:
1 – Callide C4 unit ceasing to generate, which led to the power flow to the generator actually ‘reversing’ and running as a motor (what is known as operating as an ‘asynchronous motor’), for a period of approximately 32 minutes
2 – a catastrophic failure at the C4 unit which led to a voltage collapse, which in turn led to multiple other synchronous units tripping off the system.
Lets look at each of these in turn.
Callide C4 unit failure – a bad thing happened at 13:33
Callide Power Station (Callide) is a thermal power plant in central Queensland consisting of two 350 megawatt (MW) generating units at Callide B (B1 and B2) and 466 MW and 420 MW generating units at Callide C (C3 and C4 respectively). Callide B is owned by CS Energy and Callide C by a joint venture of CS Energy and Intergen.[1]
At 1333 hrs on 25 May 2021, a series of technical failures at the Callide C4 unit meant it stopped generating. However, instead of separating from the power system at that point, C4 remained connected. In fact, it began to operate as an ‘asynchronous motor'. This is a serious condition for a synchronous generator – in effect, it reverses function and stops producing power, instead actually consuming power from the system, which can damage the machine and destabilise the power system.
What followed in the time period between 1333 and 1406 is complex. Generally speaking, while AEMO, Powerlink and Callide operators were aware that there had been a failure in C4, it was not known that the unit was operating as a motor.
There appear to be several reasons why it was not known that C4 was operating in this way.
Firstly, there had been a loss of DC power within the C4 unit itself, which meant that certain sensors and protection equipment could not function properly. During this period Callide operators, the Powerlink control room and AEMO control room were unaware that the primary protection systems were not operating.[2]
AEMO’s analysis also identifies the following factors that impacted the ability to accurately assess what happened at C4:[3]
During this period of time between 1333 and 1406, Callide C3 tripped at 1344, pulling 373 MW from the system. Unit B1 was out of service, while unit B2 tripped due to undervoltage protection at 1406, pulling 324MW from the system.
The final event that led to the cascading failure at 1406 appears to be a mechanical failure at Callide C4, which led to a fault most likely in the C4 generator itself. This led to a rapid increase in absorption of reactive power by C4. As I understand it, this final fault is what ultimately led to the loss of multiple transmission lines and other units, as discussed below.
Cascading failure – a whole bunch of other bad things then happened at 1406
The C4 unit operated as a motor for 32 minutes and 59 seconds. At the end of this time, at 1406, a fault at C4 led to a rapid series of other events – known as a cascading failure.
This cascading failure is set out in the figure below, which demonstrates the rapid fall in system frequency as the cascade progressed.
Source: AEMO, Trip of multiple generators and lines in Central Queensland and associated under frequency load shedding on 25 May 2021, October 2021, p.17
In more detail, this series of events was as follows:
There’s a lot to get through here. So, lets focus on a few key elements of this cascade.
A remarkable outcome for QNI: As identified in point 9 above, high flows across the QNI interconnector - the large transmission line that connect QLD to NSW - caused the interconnector to trip, separating QLD from the rest of the power system.
This kind of separation can be catastrophic. As regions separate, frequency can fall precipitously - in the SA black of 2016 this was the event that caused the full system collapse. Further, the reconnection (resynchronising) of regions can be very difficult, as their frequencies need to be aligned before ‘the switch is flicked’ and the regions are reconnected.
In this case however, QNI was reconnected some 16 seconds later, re-joining QLD to the rest of the power system. This is a pretty remarkable outcome – I've heard people describe it as like successfully jumping from the door of one speeding train into another, travelling in opposite directions. [Edit - I've since been advised this originally came from Dave Smith of Creative Energy Consulting, and his original analogy referred to speeding cars, not trains!]. As discussed below, it appears that the presence of primary frequency response (PFR) assisted in this reconnection.
Primary frequency response saves the day: One of the key reasons that QNI could be reconnected so quickly was down to the presence of primary frequency response in both the QLD region and the rest of the NEM power system.
PFR is a rapid and automatic response from generators following a fall in frequency. It serves to help rapidly stabilise the frequency following a major disturbance, such as the loss of a major generator or load. It was made mandatory from the bulk of the NEM generation fleet in early 2020 and has been rapidly rolled out since then.
The presence of PFR supported the effective restoration of frequency, which in turn assisted in the rapid reconnection of QLD to the rest of the NEM. This was achieved by complementing the function of under frequency load shedding (UFLS).
领英推荐
UFLS are emergency back stop measures that rapidly shed blocks of customer load, to balance an unexpected loss of generation – this also helps to stop frequency decay. However, as UFLS acts to drop large blocks of loads suddenly, it can result in ‘over shoot’, where the system frequency over corrects and goes from being too low, to too high (and vice versa). The presence of PFR helps to smooth and stabilise this over/undershoot, making UFLS function more effectively.
The presence of PFR also appears to have helped reconnect QNI, by helping to minimise the degree of voltage angle separation between QLD and the rest of the NEM (at least this is my understanding of what happened – need some actual engineers to confirm).
Unexpected operation of plant: All network and generating equipment in the NEM includes protection equipment, which operates to trip a generating system from the rest of the power system when something goes wrong. This is typically due to a fault, a major short circuit. The effective operation of this equipment is critical – as its name suggests, it protects the power system generally and the affected units specifically, from catastrophic damage.
Throughout this event, there were multiple examples of plant not operating as expected, including failure of protection equipment. For example, protection and monitoring equipment did not pick up the asynchronous motoring behaviour in the C4 unit, which was the starting point of the event.
During the cascade described above, several units also did not behave as expected, including the Townsville and Yarwun generators, both of which reduced output.
There was also a series of very complex protection equipment operations which led to the trip of all the 275kV lines out of the Calvale substation. I got a bit lost reviewing this, but it looks like a failure of a protection system on the feeder to Callide led to much bigger protection equipment triggering and separating these lines. I’m not clear on the extent to which the disconnection of these lines led to some of the subsequent generation trips…but surely it can’t have helped.
Finally, the three Stanwell units, totalling 1089MW of generation capacity, were also removed from service during the event due to the unexpected operation of a ‘trip to house load’ (TTHL) scheme. TTHL schemes are one form of system restart ancillary service, which are used to restart the system following a major black out, or black system event, such as what occurred in South Australia in September 2016. They operate by disconnecting the generator from the power system during the initial disturbance, while maintaining supply to its own support equipment, which allows the generator to operate as an island and then be available to reconnect to the system at a later point in time, to support the restart.
During the cascading event, the Stanwell TTHL operated and safely removed the generator from service. The problem is, it appears that AEMO had not been informed that the TTHL would trigger in the conditions that occurred on the day – which meant that over 1000MW of capacity was removed from the system. These settings have since been removed, pending review with Stanwell, who operates the plant.
UFLS – the last line held, but with some challenges: As explained above, UFLS sheds blocks of customer load to help address a frequency fall. It is effectively the last line of defence before a full system collapse occurs – ie, a black system. Its pretty damn important that it works well.
While the UFLS functioned effectively in QLD, AEMO has identified that the volumes of load lost in each block were not necessarily as expected – in some cases more, and in others less load was shed than expected.
AEMO has advised it will undertake further work to understand the function of load blocks, and how the UFLS can be expected to function. In addition, AEMO will also assess whether the UFLS scheme is likely to continue to remain effective as inertia falls and distributed generation grows in the Queensland region, or if similar events had occurred under different operational conditions.
Won’t somebody think of the solar farms? During the incident, due to the loss of several large generating units in Central Queensland, system strength levels were reduced and solar farms and wind farms in Central and Northern Queensland were automatically constrained to zero by system strength constraints.
This occurred because inverter connected generation like wind and solar requires an amount of ‘fault current’ to remain stable. Thermal, synchronous generators like Callide, Gladstone and Stanwell are key providers of fault current, so their tripping meant that multiple solar farms also had to be turned off in order to keep the system stable. This in turn removed further MW of energy from the market…which can't have helped the supply shortage (and high prices) that followed the event (see below).
While the tripping of these solar farms was necessary to keep the system secure, it highlights the current interdependency that exists synchronous coal and gas generators, and non-synchronous wind and solar. This is a problematic link, as the old synchronous units retire or age, and become less reliable. It highlights the urgency of trying to break the nexus between the provision of system strength / stability, from the provision of power. This can be achieved by installing synchronous condensers, which provide system strength but do not operate as generators, or batteries with grid forming inverters.
More on this in a later paper!
Apre moi, le deluge
There’s a bunch of stuff in the AEMO report about load restoration and generator availability following the event, which I find a bit boring so won’t go into it here – you powernerds can read about it on pages 39-44. The net outcome was that following the event, there wasn’t enough generation available to keep the reserve headroom that AEMO needs to maintain reliability.
Following the event, due to this lack of reserve, AEMO activated the reliability emergency reserve trader (RERT). RERT is an intervention mechanism under the NER that allows AEMO to contract for emergency reserves, such as generation or demand response, that are not otherwise available in the market. On 25 May 2021, AEMO activated 15 MW of RERT for the period 1700 to 1945 hrs in response to the lack of reserve forecast. Its not clear yet what this will cost customers.
Between 1425 hrs and 1840 hrs, the Queensland energy price was volatile. For 30 dispatch intervals (DIs) over this period the Queensland energy price exceeded $14,700/megawatt hour (MWh). The flow-on effects of the incident also impacted New South Wales, with high prices recorded in the late afternoon.
These high prices reflected the decrease in available generation following the cascading failure. Other factors exacerbated this supply shortfall, including multiple synchronous units being physically unavailable as well as the multiple solar farms mentioned above being curtailed due to system strength limitations.
These high prices are described in the figure below.
Source: AEMO, Trip of multiple generators and lines in Central Queensland and associated under frequency load shedding on 25 May 2021, October 2021, p.60
As our friends at the AEC have previously noted, a rule change was submitted some time ago to place a lower cap on possible market prices following these kinds of events. Although this was rejected by the AEMC, it will be interesting to see if the idea resurfaces.
Never waste a good crisis…
This is actually something that was said by a senior executive at a public meeting following the 25 August 2018 QLD/SA separation event. Although it garnered a few shocked (and probably disingenuous) gasps at the time, the speaker was dead right. These events teach us a lot about the dynamics of the system, what works well, and what needs to be improved.
The clever engineers at AEMO have set out a number of very sensible and very detailed recommendations in their report – I won’t summarise them here because I would rather push my own barrow, however powernerds can read them from page 9 of the report.
For me the key learnings of the incident are as follows:
I’m sure I have gotten a bunch of things wrong here – I bashed this out after a long day wrangling connection reform initiatives, AEMC rule changes and ESB fun. Hit me up if there’s anything grossly wrong, or that you violently disagree with. Otherwise, my fellow powernerds, stay safe and keep motoring. Just not asynchronously.
[1] AEMO, Trip of multiple generators and lines in Central Queensland and associated under frequency load shedding on 25 May 2021, October 2021, p.8.
[2] Ibid. p.19
[3] Ibid.
Electrical Engineer
3 年I disagree Christiaan that there will always be rotating synchronous mass. You may have to put up with voltage V-theta response inverters with some degree of synthetic inertia incorporated in its inner feed back loop. Once inverters are the majority of contributors, preventing serious torsional oscillation on turbine shafts will be a VERY difficult business. Surely you must b e aware that we are the world's tesbed. Let's chat
Ethical engineer focused on sustainable renewable energy outcomes | Power Systems Grid Specialist | NPEEPY 2020 | FIEAust, CPEng, NER, APEC Engineer, Int PE (Aus)
3 年Christiaan Zuur "We need to break the link between synchronous MW and system stability. This is the key one for me, although its not a focus of AEMO's report." Wondering who is the "we" as physics on an alternating current power system relies on the control of synchronous units in a stable manner to drive the power system. Renewable generation as it takes over the system, will also need to have controls that drive dynamically respond to stabilise the system. This is why the frequency droop control was also mandatory on RE gens. Future stabilisation will come from the fast acting response of inverter based generation or BESS, as it has no rotating mass behind it and can act in sub cycle response times, it will be very effective. However, there will always be synchronous electromagnetically coupled rotating mass on the system with which the IBG will need to work with. One of the things I took out of the report was to question why the network controller did not feel empowered to act. Why not trip the unit in the minutes after it was observed to be motoring? Had the unit been tripped off at the HV yard in minutes, the damage to plant would have been much less. Has the market disempowered the operators? Is their training sufficient, is their understanding of what is going on sufficient, are they able to think clearly and take action in abnormal rare circumstances? William Wood?
MIT | Massachusetts Institute of Technology (MIT)
3 年Thanks for sharing!. From the frequency decline carve and the happened cascading events, it seems that the system needs good coordination between the different power/control areas especially from frequency stability and security viewpoint, so that it can avoid this scenario and more dangerous scenarios in case of a fault leading to a higher active power imbalance. The most serious point is that in the future Australian power system with a higher share from "large-scale inverter-based generation" the risk will be much higher. A special solution is needed for this situation, a low-frequency ride through, maybe!!. However, Your system operator's behavior was good in isolating the fault and avoiding severe brown/blackouts. Here is a paper that shows the most blackouts around the globe that started from events like this one https://www.mdpi.com/1996-1073/12/4/682
Advisor at Australian Energy Market Commission (AEMC)
3 年An excellent summary! The suggestion that AEMO should apply a lower market price cap through security events like this one seems counter-intuitive - why dampen the price & investment signal for generators to be ready and waiting to assist the system at the times when they are most needed?
I think the 15,000 alarms had a lot to do with why no one acted to disconnect C4. And clearly if you loose the D.C. field to the Rotor you should also trip the generator automatically. This would have prevented everything that followed….