登录查看更多内容

DevOps - P2 - Attention to detail case study

Greg Bala

发布日期: 2021年11月25日

Ho ho ho! I have a fire side story for you today. Grab your eggnog if you must, Scotch if you can, and let me tell you a story when Christmas almost failed.

In Part 2 of our DevOps adventure, I stressed that the non-production environments must be as similar to production as possible, in everyway. Well, let me tell you a story, where a seemingly innocuous?difference, had the power to blow up production!

Our story is almost a holiday tale, as it took place when happy carols fill the airwaves, malls are full of pissed-off people socking up the cheer, and coffee-driven developers are eager to squeeze in changes before the end of the year.

Some technical mumbo-jumbo to set the stage

Story has to do with SQL Server. SQL Server, for the unacquainted, has both Data and Log files. When files grow over the max size of the drives they are on, all hell breaks loose. Log files are particularly sinister as they may temporarily swell to many times their regular size, just to shrink again. When the drives are unable to accommodate the swelling, the system simply grinds down to a halt. So when you do some heavy operations, which can cause the Log files to grow, you typically keep a close attention to the the size of those Log files.

Now to Santa's workshop.

It so happen to be, that one of Santa's most eager helpers, was doing exactly that - a critical upgrade on Santa's sleigh - Santa is exclusively on Microsoft tech. Anyway, all of north's poles procedures were followed diligently, all proper testing and monitoring and planning and drinking and risking and caroling was done.

There was just one little problem.... on little difference between the practice and production sleigh

You see, Santa's testing sleighs, had the Data and Log files on the same drive. Production, (unknowingly to the poor overzealous elf, who was never allowed to touch let alone sit in the production sleigh) had them separated on two drives - one drive for Data files, another for Logs

Oh but our little elf was so careful in upgrading the production sleighs. Monitored that one drive so well! All was going so well... and if it was not for that one pesky little difference between the production and non-production sleigh, all would have finished well, as well. Too any wells... bad omen.

领英推荐

Issue #6: Systemd Timers, Kubernetes Shift Down, K8s…

Bibin Wilson 4 天前

Learn Kubernetes weekly — issue 21

Learnk8s 1 年前

Top DevSecOps Tools for 2025

Bytebase - Database CI/CD and Security at Scale 2 个月前

Alas! The Data files, on the drive were doing just fine but the Log, well, it ... it swelled! And burst open as one would expect. That one little difference caused a whole lot of crap! (1)

(Serious) moral of the story

When I stress that non-prod environments, especially Staging, have to be as close to production as possible, I am not just trying to be difficult or over protective. Such stories are plentiful - everyone has a few of their own. Even little, seemingly benign difference, can cause problems.

If you want reliable DevOps, do not let unnecessary differences creep into your environment setup. Think twice if you introduce a difference to save cost. Try to fix difference later on, that you are forced to live with now. Be aware of all the differences you have, make sure they are well socialized.

If most of one's work is done in non-production environments, and it should be the case, the setup, the structure of that environment burns into your memory. You move around almost instinctively. Forcing engineers to do a context switch when traveling between prod and pre-prod environments is a recipe for disaster.

Worse, there may be some actions/scripts, that will have to written differently for the different environments. This means that any test performed on staging, is not sufficient for production. Technically and practically, you will be running and testing your script for the first time on live production.

Take the time to have all your environments configured as similarly as possible.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(1) OK ok, for dramatic effect, I embellished the story a bit. Nothing really bad happened at the end. The elves fixed the problem before it was too late, but it was so close!

要查看或添加评论，请登录

Greg Bala的更多文章

If AI cannot even count, then how much can it be trusted?

2024年11月20日

If AI cannot even count, then how much can it be trusted?

Not so long ago, ChatGPT could not even count characters in a sentence. You asked it to count characters, and it would…
AI does not understand your questions

2024年11月15日

AI does not understand your questions

While AI is awesome, the danger is, that we will start believing it actually understands our question. It does not.

1 条评论
High-Quality Software: Start with the Right Team

2024年11月5日

High-Quality Software: Start with the Right Team

Creating quality software is a challenging endeavor, while producing poor software, unfortunately, is easy. Over the…
AI is getting "the shit"!

2024年10月23日

AI is getting "the shit"!

ISMO, a Finish comedian, explains how complicated English is, by a careful analysis of the common English word: SHIT. I…
Never "trust" ChatGPT.

2024年10月2日

Never "trust" ChatGPT.

For the love of God and humanity, never TRUST AI like ChatGPT. it does not actually understand your question It is a…

2 条评论
DevOps - Part 5.4 - Handle DB changes - Supplementary processes

2024年7月4日

DevOps - Part 5.4 - Handle DB changes - Supplementary processes

Now that we have the system well understood, we must add a few business processes to help utilize it well. The…
DevOps - Part 5.3 - Handle DB changes - Usage patterns

2024年7月4日

DevOps - Part 5.3 - Handle DB changes - Usage patterns

We are continuing building our process of handling database changes. We now have a fully working system, now lets look…
DevOps - Part 5.2 - Handle DB changes - Solution unveiled

2024年7月4日

DevOps - Part 5.2 - Handle DB changes - Solution unveiled

Continuing our deliberations from part 5.1.
DevOps - Part 5.1 - Handle DB changes - Intro

2024年7月4日

DevOps - Part 5.1 - Handle DB changes - Intro

This part 5 of a the series describing a minimum setup for a modern DevOps culture for mature organizations, I present…
DevOps - Part 4 - Finally, deploying!

2024年6月27日

DevOps - Part 4 - Finally, deploying!

This is part 4 of a series describing a minimum setup for a modern DevOps culture, for mature organizations that have…

1 条评论

See all articles

DevOps - P2 - Attention to detail case study

Greg Bala

Some technical mumbo-jumbo to set the stage

Now to Santa's workshop.

领英推荐

(Serious) moral of the story

Greg Bala的更多文章

社区洞察

其他会员也浏览了

Everything As Code (EaC) What It Is and Why It's Gaining Popularity?

Best Practices for Securing Git LFS on GitHub, GitLab, Bitbucket, and Azure DevOps

?? DevOps Weekly #439: Dealing with Rejection in Distributed Systems, How AWS Powered Record-Breaking Prime Day, and The Evolution of Block Storage

DevSecOps on AWS Using CodeCommit, CodeBuild, CodePipeline and CloudFormation

Daily DevOps Challenges: Tackling the Complexities of Modern Toolchains

Continuous Integration on AWS Cloud

?? DevOps Weekly #372: 35 Million Hot Dogs: Benchmarking Caddy vs. Nginx

From many years of managing Elastic and Open search at scale, Here are some of my most useful DevOps commands ??

5 Kubernetes Shell tricks

Code It. Secure It. Automate It. – A Hands-On Journey into Secure DevOps

Some technical mumbo-jumbo to set the stage

Now to Santa's workshop.

领英推荐

(Serious) moral of the story

Greg Bala的更多文章

If AI cannot even count, then how much can it be trusted?

AI does not understand your questions

High-Quality Software: Start with the Right Team

AI is getting "the shit"!

Never "trust" ChatGPT.

DevOps - Part 5.4 - Handle DB changes - Supplementary processes

DevOps - Part 5.3 - Handle DB changes - Usage patterns

DevOps - Part 5.2 - Handle DB changes - Solution unveiled

DevOps - Part 5.1 - Handle DB changes - Intro

DevOps - Part 4 - Finally, deploying!

社区洞察

其他会员也浏览了

Everything As Code (EaC) What It Is and Why It's Gaining Popularity?

Best Practices for Securing Git LFS on GitHub, GitLab, Bitbucket, and Azure DevOps

?? DevOps Weekly #439: Dealing with Rejection in Distributed Systems, How AWS Powered Record-Breaking Prime Day, and The Evolution of Block Storage

DevSecOps on AWS Using CodeCommit, CodeBuild, CodePipeline and CloudFormation

Daily DevOps Challenges: Tackling the Complexities of Modern Toolchains

Continuous Integration on AWS Cloud

?? DevOps Weekly #372: 35 Million Hot Dogs: Benchmarking Caddy vs. Nginx

From many years of managing Elastic and Open search at scale, Here are some of my most useful DevOps commands ??

5 Kubernetes Shell tricks

Code It. Secure It. Automate It. – A Hands-On Journey into Secure DevOps