The Millennium Bug: A collosal waste of money, or the biggest engineering feat in history?
AI-generated image

The Millennium Bug: A collosal waste of money, or the biggest engineering feat in history?


Tick tock, Tick tock....

The year is 1999


As most people are counting down for the biggest party on earth, many people working in tech were trying to avert one of the biggest crises in history: 'The Millennium Bug'

I was one of them.

I was 23 years old

This is my story.


To tell the story properly, I have to go back to one year earlier

I joined Thames Water, the largest water utility in the UK in 1997.

To give some perspective, Thames Water has 16 million customers, delivering 2.6 billion litres of drinking water to customers every day.

I worked in the customer contact centre, and my job was to respond to written customer complaints. One that I did with ease (and boredom), and so I wanted to seek other opportunities.

So, I started to volunteer for things, not really sure what I was volunteering for, hoping it might lead to better things.

It might have been the best decision I ever made - I just didn't know it yet!


Tick tock, Tick tock....

Spring 1998.

The governance and compliance team needed help compiling the 'July Return'.

What's that I hear you ask?

Each water utility is given a license to operate by Ofwat, who regulates all water utilites in England and Wales. As a result, each regulated entity is expected to meet certain performance measures regarding water quality, pressure, as well as planned and unplanned interruptions. These are known as Director General (DG) levels, 1 to 9.

The July Return is therefore the detailed annual submission to Ofwat, on each of those measures. Failure to meet expected performance standards might be met with restrictions, fines, and even a loss of license.

My role was to help collate, and consolidate multiple data sources, to help in the compilation of the report, so that it was appropriately evidenced.

Not only did this give me experience in crunching through large data sets, but it also gave me a deep seated knowledge of each of the expected performance standards, and the repercussions for not meeting them.

Once the July Return had been compiled, I decided that governance and compliance wasn't for me, but I really liked dealing with data, so I joined the programme management office


Tick tock, Tick tock...

Autumn 1998

My role in the programme management office, was to collate and consolidate updates on all in-flight technology projects and programmes, to understand current status, whether they were on track or not, and why some of the key milestones and performance measures might be slipping, and how to get back on track.

I already knew how to capture and interrogate the required data. I got used to asking the right questions, to illicit the right information, and reporting in such a way, that provided accurate and meaningful updates to management

There was one mega-programme to trump them all - 'The Millennium Bug' - otherwise known as 'Year 2000', shortened to 'Y2K.

Unlike many other projects and programmes, where deadlines and milestones often slipped, this was one of those rare moments where the date is set in stone and can not move.

If we surpass it, we risk major technical failures, and a huge operational crisis.

That date was '1st January, 2000' - the date of the new millennium.


Tick tock, Tick tock...

January 1999

The programme updates on 'Y2K' were getting bleaker.

Major pieces of work and milestones were in the red.

Other non-essential work, and projects were stopped, and resources diverted.

But it wasn't enough, the programme needed more people - whether from inside the company, outsourced service providers, or expensive contractors (and boy were they expensive!)

And so, I volunteered to join the Y2K programme full-time.

My career was about to kick off in a big way...


Before we go any further, let's take a minute to talk about what the millennium bug is, and why exactly it had multiple governments and organisations around the world in utter turmoil.

What is the millennium bug?

The problem at face value wasn't that difficult.

All computers (mainframes, servers, desktops etc) have inbuilt system clocks and dates to make sure they run, and time-stamp logs accurately. This was typically stored in a 6 digit date format - DD/MM/YY, (or MM/DD/YY in America)

So, the 1st November 1999, would be 01/11/99.

Now what hadn't been fully considered was that the year's 1900 and 2000, would both be represented as 00 in system clocks, and so the first day of the new millennium would be represented as 01/01/00. And each subsequent year would also be duplicated.

So you see the problem?

The theory was that many computers would revert to default settings, would fail, or crash completely. Depending on what that system was supporting, it could cause multiple catastrophic errors, creating a cascading failure and ripple effect throughout the world, if air traffic control, critical infrastructure, emergency services, hospitals were simultaneously impacted.

Calling it a bug was maybe not accurate as it implies that it was unique to a single operating system, when it affected multiples.

Perhaps it's better to call it a system-design flaw.


What was the fix?

The main change was to move from a 6-digit date format, to an 8-digit format.

So, the 1st November 1999, became 01/11/1999

And thus, 1st January 2000, became 01/01/2000.

Thereby removing the duplication of the year.

Sounds simple right?

Wrong!


Though the fix sounds like a simple enough change, imagine having to rearchitect and recode multiple computers and systems, across multiple companies. Then extrapolate that around the globe.

Some of these were old, legacy systems, some of these were hard-coded, and to do these major changes effectively, you had to take them offline, and do a huge amount of testing and verification, before you could even think about doing it in a live environment.

And what about all the system interfaces, how did you update and test those together, what if those were outsourced, or delivered by other third parties outside your control.

What if you, or them were running behind?

So, now you see the problem!


The media hype

This wasn't just the mild hysteria of a few organisations, it took hold globally.

Multiple government agencies warned of this literal 'ticking timebomb' - media outlets delighted in fanning the flames, convincing the general public that the apocalpse was nigh, and planes might start dropping out of the sky, as computers failed one-by-one, causing a technology blackout.

And so regulators demanded to know what we were doing to manage and control the risk?

Afterall, your one of the biggest water utilities in the UK

You serve millions of people, including London.

A failure to respond, could not only lead to loss of water, but a loss of life.

Melodramatic?

Absolutely not.

It was an agonising real risk, and we had to step up!



So, now we're all caught up on what the heck the millennium bug was all about, and why it was such a big deal, let's get back to the story...


Tick tock, Tick tock...

March 1999

The role I was initally assigned to do on the Y2K programme, was not the one I would end up doing.

I was initially assigned to document all of the technical system checks that had to be done on the stroke of midnight, to verify that everything was up and running and working as expected.

I had this constant nagging feeling:

What if the systems don't work?
What if we can't supply water to people?
What if any of those people have special needs?
How would they tell us?


These nagging questions came as a result of me working with the governance and compliance team the year previously, and helping to compile the data on expected performance standards. I knew these weren't just a range of KPIs, some of these were absolutely critical.

Without going into detail, let me just list the key measures:

  • DG1 - Water supply interruptions
  • DG2 - Water pressure
  • DG3 - Supply restrictions
  • DG4 - Unplanned interruptions
  • DG5 - Leakage
  • DG6 - Customer complaints
  • DG7 - Billing and metering
  • DG8 - Sewer Flooding
  • DG9 - Environmental Flooding


The reason that DG6 and DG7 are highlighted in bold, as these fell directly under the responsibility of the customer contact centre, where I worked.

This mainly consisted of three core areas

  • Customer call centre, which was split in two - an emergency call centre for people to tell us they had water supply issues, and the other for routine enquiries about their bills or payment plans.
  • Account management team, that dealt with residential and commercial customer complaints
  • Debt management team, that dealt with all customers in arrears with their bills. I'm not sure about other countries, but in the UK you're not legally allowed to turn the water off to residential properties, only commercial properties, so the debt issue for residential customers was a big problem!


Of all of those, the emergency call centre became Priority No.1


A new mission

And so my mission became 'protect the call centre'

I got to work immediately to understand what systems and tools they utilised

What were the gaps that needed to be fulfilled, what were the key risks?

Can we go completely manual, and for how long?

Where do we even get the information from, and how do we get accurate information to the engineers?


So many questions, so many considerations, so many plans....

I had no idea that what I was actually doing was business continuity - I didn't even know this was even a career choice, until someone pointed it out to me.

I was smitten, and I knew immediately this was the role for me.

I had found my calling.

I relished it wholeheartedly.


Tick tock, Tick tock...

October 1999

Over the course of six months, I worked directly with the contact centre to put multiple contingency plans in place:

  • Split incoming telephony lines between two service providers - so if one service provider was down, we could divert calls through the other provider, ensuring that they were through different exchanges
  • Changed the generator coverage and got priority diesel allocation - in the event of power shortages, the generator was to cover the server rooms, network, telephony and call centre on full load, and should there be fuel shortages, we had critical infrastructure prioritisation to refill the diesel tanks
  • Installed soft turrets across other areas of the building - those familiar with call centres of old, will probably know that each workstation had to be networked in to the call centre to be able to receive and log calls to agents. We increased the capacity of agents, but we also had multiple volunteers on standby, in case we had to go to pen and paper, and the soft turrets, meant that we could use other desks, and telephones throughout the building.
  • Special needs flags on all accounts - in the event of multiple people queing in line, anyone with a special needs flag (e.g. kidney dialysis, elderly or infirm) were prioritised, and pushed to the top. In addition, manual lists of special needs customers were printed and held at each depot, in case of water supply issues in the area, so they again could be prioritised
  • Training of multiple customer service volunteers - in the event that front line call centre agents were overloaded, we had multiple people trained to take calls, to prioritise and triage each one, and to co-ordinate with our 'Major Event Centre' that allocated engineers.


We tested, and tested again, until we could do no more.


Tick tock, Tick tock...

31st December 1999

The world was on standby, now all we could do was watch and wait.

For it wasn't just one big bang - the 'millennium' happened multiple times over, as it went through all 24 timezones.

The UK was slap bang in the middle of it all, thanks to a little thing called 'Greenwich Mean Time' (GMT) - which means half the world is +/- 12 hours GMT.

Each countdown, each time one step closer, as the millennum moved across every continent in turn

Waiting with baited breathe for stories of the lights going out, system crashes, failures?

But the media was only reporting and showing the joyous revellers, and fireworks, as each country celebrated the dawn of a new era.

Then it was the UK's turn, counting down the thumping bongs of Big Ben.

Like time had slowed right down, our hearts thumping loudly in exhilaration and trepidation.

For Thames Water, and many other organisations, everything went without a hitch.

No unexpected system failures, all power, networks, and telephony were up, and operational.

All system verification checks passed.

Volunteers were stood down.

Celebrations began.


January 2000

As the dust settled, and the world evaluated the outcomes of Y2K, it was in some respects an anticlimax, in comparison to what was anticipated and reported.

It is not an accurate statement to state that 'nothing happened', as there were indeed stories appearing of small system failures, mainly due to lack of preventative action, but most were quickly remediated. Isolated reports highlighted that cash machines weren't working, some hospitals and medical facilities had patient dates of birth wrong, and some had miscaluated pregancies, or diagnoses as a result.

It’s estimated that the cost of the global effort to prevent Y2K exceeded £300bn (£633bn today, accounting for inflation). For Thames Water, the investment was 10's of £millions.

The world started to split into two camps, with differing viewpoints:

1, Was Y2K overhyped and collasol waste of money?
2, Was Y2K one of the biggest engineering feats in history?


Like many other companies around the world, we fell into Camp 2

When faced with the risk of doing nothing, ask yourself what are you prepared to do?


Media Backlash

The media who had been predicting Armageddon, had now changed tactics, even going so far as to insinuate it was big scam, cooked up by a few tech and professional services companies as a way to make mega money from all the hapless companies, who didn't know any better.

This was quite frankly rediclious, and dare I say a massive punch in the face to all those companies that didn't take the risk, and whom did what was needed to protect people and services.

The reality is that we might never know.

Though I was not in charge of the decison-making, having seen the work first-hand, having felt the deep-seated dread that was looming, I believe whole-heartedly it was the right decision.

Those sitting on the sidelines, and those with the benefit of hindsight might well have felt differently.


Tick tock, Tick tock...

March 2000

Sure the millennium was over, but we weren't finished yet!

We had built a great momentum, there was still much work to be done beyond the preparations for Y2K.

And so I was seconded to the Head Office, where I would repeat the process for the engineering and laboratories divisions.

We built and tested plans.

I was promoted.


Tick tock, Tick tock....

April 2001

A new career in business continuity was flourishing.

But I was continuously drawn to the world of tech, and so in April 2001, I joined AXA.

Following a series of mergers and acquistions during the late 1990's, AXA was undertaking a huge array of server consolidation and IT transformation programmes to bring each of the operating companies together.

I pivoted from business continuity into disaster recovery

Little did I know, that another crisis was looming.

This time there was no pre-planning, and notification

The world was about to change in a way that we could never comprehend


That date was 11th September 2001, otherwise known as 9/11

The deadlist terrorist attack in history...

I was 25 years old.


Tick tock, Tick tock......



Over the course of the next 20+ years, my career continued to grow, as I transitioned from business continuity, to disaster recovery, to crisis management, to cybersecurity, under the overarching banner of resilience

I have continued to be on the front line of many major incidents, and global events.

I started to become disillussioned

Why are we not learning lessons?

How bad does it have to get before we make a change?


Then I started to change my perspective.

Perhaps I'm exactly where I need to be.

Maybe I'm the one that needs to instigate change.

So I put pen to paper.

And I published my first book in 2022...






-



Guy Giraudeau

Project Manager (retired) - now doing other projects!

1 个月

By 1999, I'd moved on from my previous employer, SCS, but heard that one of my former colleagues was working overtime to amend all our suite of applications to replace two-digit years to four- digit years. I think he did very well out of that millennium scare, Sarah!

回复
Olivier Subramanian

Experienced Cloud Strategist and Business Advisor with a focus on Public Cloud Investments

1 个月

Funnily enough this was a topic of conversation on Friday after realising that we had people in the team who had not been born then! My memory of that time working in Financial Services was all the contractors raking in the cash in the run up to the New Year and everything still working afterwards.

Shaun Van Niekerk, CISSP

Director and CISO Steeple Cyber | Healthcare Technology & Cyber Specialist | Mentor | UK Cyber Security Council Chartered Status Member and Assessor

1 个月

Love this Sarah Armstrong-Smith! My first promotion in the NHS was that of a Y2K co-ordinator, and me and my colleagues worked our a$$e$ off to ensure healthcare systems were compliant. It was a tedious project and wanted to quit a few times but glad I stuck it out because the skills it taught me, remains priceless, includes architecture, co-ordination, planning, quality control, design, structure, systematic thinking, risk management, process modeling, effective communication, human resilience etc. It shaped me on so many levels so I for one am grateful for the privilege of working on that project.

Jamie Elliss

? Top Cyber Security Voice ? Helping senior leaders build Microsoft + wider vendor capability for Security - Cyber | Cloud | Data | AI | GRC | Apps ? Technical | Sellers | Executive ? Hidden2Hired Job Seeker Mentor ?

1 个月

I love this Sarah Armstrong-Smith my first job out of Uni (insert cough) in 1995 was placing IBM MVS Cobol CICS DB2 capability for what became Y2k into Boots and Experian RS Components and a list of panicking enterprise organisations. The spend was insane, I always refer to it as my first recession when nobody had any budget left by 2001-2002 for anything. It was definitely a serious feet of engineering though. Contractors were sooooo expensive retired Cobol programmers could come back and top up their pensions for a year or 2 having not even worked in the last 3-5 years. I can't think of a single engineering feet in tech that matches it, but if you listen to Gartner it cost $300-600 billion back then! (Gartner have got better at data accuracy now though ??) Selfishly it also allowed me to set up on my own in 2003 as the market was so poor the risks reduced. That bit I'm pretty grateful for so thanks for posting this brought back a load of funny memories I'd forgotten.

Y2K, yes I remember it very well (as if it was yesterday).? I was a Director at a large USA software company then and the IT teams spent months preparing for that day - thankfully all went well and no problem occurred. Some say all the preparations were a total wast of time and money, but I say "better to be prepared for any problem than let a problem occur". Its never a wast of time or money to be prepared!!

要查看或添加评论,请登录

Sarah Armstrong-Smith的更多文章

社区洞察

其他会员也浏览了