Cautionary Reminder on Major Updates to Prod !!!

Cautionary Reminder on Major Updates to Prod !!!

As professionals working with complex platforms, it is essential to keep these three Key Principles in mind:

* Avoid Overconfidence: No matter your years of experience, never underestimate the intricacies of interconnected systems

* Prioritize Rested Decision-Making: Don’t tackle critical design decisions or complex problem-solving when you're exhausted after a long day

* Upgrade in Stages: Major changes or deployments should always be approached incrementally to identify and resolve issues step by step

You might wonder why this cautionary note ???

Let me share the backstory of my recent weekend adventure.

My main desktop - a custom-built Linux workhorse - has served me reliably for the past two years. It is equipped with an MSI B550 motherboard, an older NVIDIA GPU with 8 GB VRAM, and running Linux Mint 21 (with 5.x kernel) OS.

Lately, have been wanting to explore and test the image diffusion models, which require at least 16 GB VRAM. This meant upgrading my GPU. So decided to tackle the upgrade this past weekend, which included:

* Updating the motherboard BIOS to support the newer GPU

* Swapping the older GPU for a newer 16 GB NVIDIA model

* Upgrading the OS to a Linux Mint 22 (with 6.x kernel)

Was so confident that could complete everything in under an hour - Easy Peasy, right ?

With backups completed by Saturday evening, kicked off the activity on the major upgrades.

Downloaded the latest BIOS firmware for my MSI B550 motherboard, saved it to a USB drive, and flashed the BIOS. Confident it would work, skipped testing to ensure the system booted correctly.

Next, swapped out the old GPU for the new one without verifying the system’s response. Overconfidence struck again.

Finally, booted from a USB drive with Linux Mint 22 (with 6.x kernel) for a fresh install. That’s when Murphy's Law kicked in to unravel the unexpected !

On boot, my display wouldn’t work. After some trial and error, removing one of my dual monitors, managed to see the install menu. Excited, completed the OS installation, but upon reboot, the system failed to start.

Five hours later, despite trying BIOS tweaks, OS re-installs, and even reverting to the old GPU, nothing worked. Exhaustion and frustration set in.

In the early hours of Sunday, decided to take a break, sleep, and approach the problem fresh in the morning.

Few hours later, as sipped my morning coffee, a thought struck me: Secure Boot. Modern motherboards often enable this setting by default after a BIOS flash. Secure Boot blocks third-party drivers unless signed with a trusted key.

Rushed to my basement, once again booted the system, entered the BIOS, and disabled the Secure Boot option. And voilà, the system roared back to life !!!

Went on to complete the installation, loading the recommended NVIDIA drivers, and my desktop was finally up and running !!!

Reflecting on this experience, realized the importance of the three key principles mentioned earlier:

* Overconfidence: Skipping validation steps led to hours of unnecessary troubleshooting

* Exhaustion: Tired decision-making compounded the problem

* Sequencing Upgrades: Tackling one change at a time would have pinpointed issues earlier, saving time and effort

Hope sharing of my weekend experience serves as a reminder that even seasoned developers/engineers need to approach major updates to prod with humility, caution, and a methodical mindset.

P.S. The article image was generated using the 'flux-1-schnell' diffusion model !

要查看或添加评论,请登录

Bhaskar Swaminathan的更多文章

  • Mindset - An 'Aha' Moment !

    Mindset - An 'Aha' Moment !

    Just finished reading another interesting book — Mindset by Carol Dweck! While reading the book, had a true 'aha'…

    5 条评论
  • Generative AI Starter Pack !

    Generative AI Starter Pack !

    The Generative AI space is evolving at a dizzying pace and some of you have requested me to pull together all the…

    1 条评论
  • The Kid with the Blue Cap

    The Kid with the Blue Cap

    Nestled within the idyllic confines of Green Springs lay a friendly neighborhood with a handful of homes, surrounded by…

    1 条评论
  • AWS CLF-C02 and SAA-C03 Certifications

    AWS CLF-C02 and SAA-C03 Certifications

    In response to some of you asking me for guidance on how/where to get started with the AWS Cloud Practitioner (CLF-C02)…

    2 条评论
  • Unpacking the Mystery behind Deep Learning !!!

    Unpacking the Mystery behind Deep Learning !!!

    There is a lot of buzz, hype, and opportunities around Artificial Intelligence and Deep Learning especially after…

    4 条评论
  • Celebrating 15 Years of PolarSPARC !!!

    Celebrating 15 Years of PolarSPARC !!!

    Time has a remarkable way of flying by and today we celebrate 15 years of PolarSPARC - a remarkable journey that began…

    14 条评论
  • Journey into the Galaxy of Machine Learning ...

    Journey into the Galaxy of Machine Learning ...

    With COVID-19 restricting a lot of our activities, it allowed me to take the next topic off my TO-DO list - Data…

    1 条评论
  • Build New or Modernize ???

    Build New or Modernize ???

    Have been living in the same house for more than two decades now ..

    2 条评论
  • The Mythical 10x Full-Stack Engineers ...

    The Mythical 10x Full-Stack Engineers ...

    Often hear that we need to hire the "10x High Performing Full-Stack Engineers" to stay ahead of the curve and…

    4 条评论
  • The 'Cheetah-corn' Brain ...

    The 'Cheetah-corn' Brain ...

    What the heck is a Cheetah-corn ??? It is a cross between a Cheetah and an Unicorn ..

    6 条评论

社区洞察

其他会员也浏览了