登录查看更多内容

How Long is your Tool …… Chain?

Chris Petersen

Do-er of the Difficult, Wizard of Why Not, and Certified IT Curmudgeon

发布日期: 2024年2月13日

Wishing everyone a happy Valentine’s Day a little early!? I hope Cupid’s are the only arrows sticking out of your hearts in the coming days!

One of my many … issues?? Mantras?? Catchphrases?? Design constraints?? Is that I try to aim for relatively short tool chains, especially when those tools are either mission-critical or key to incident response for mission-critical workloads.

Let’s imagine a very simple scenario.? You’ve got a bunch of mission-critical workloads running on-premises in (virtualized) AIX in IPv4 (Internet Protocol version 4) sub-net A, VLAN (Virtual, Local-Area Network) A’, and behind LAN switches A’’.? You’ve got a trivial log catcher running on-premises in (virtualized) Linux in IPv4 sub-net B, VLAN B’, and behind LAN switches B’’.? One can assume (uh-oh!) that there may be core switches between A’’ and B’’ and that they also perform the IP routing function.? There might even be no firewalls between A and B (oh, the horrors!).

So, how long is the chain between your log producers and your log consumer?? Three physical LAN switches, two virtual LAN switches, one router, two hypervisors (plus a bit), and one small piece of custom code.? The message producer and network sender are built into the operating system.? The total physical distance between producer and consumer, as the cable runs, is probably under 100 feet (30ish meters) and around 1 millisecond or less in network latency.

You still have to code your application or software component to produce the messages with the appropriate facility and severity codes to get them out of the local box.? The syslog (or rsyslog or syslog-ng) service has to be configured to pass them on, and someone has to write their custom code to receive and store those messages.? But, that’s about it.

If we’re thinking about old (old old old) school AIX syslog, then it’s using UDP (User Datagram Protocol) as its transport layer, so SSL (Secure Sockets Layer) inspection and all kinds of other man-in-the-middle tools don’t really play a part.? There are also no acknowledgements at any protocol layer, so we’ll never know if our messages made it or not.? Unfortunately!

Assuming that both ends are using enterprise-grade storage across a SAN (Storage-Area Network), that injects a few more possible fault domains.? Even in this very simple scenario, we’re looking at a ton of Murphy’s Law potential.

Let’s add one more feature to the mix.? The trivial message catcher looks for a few kinds of messages and sends e-mail if it sees them.? (Groan!)? Well, that’s back out across the LAN, into the on-premises, e-mail relay box(es), out through the firewall, across a section of the global Internet, through a service provider (or two or three or four – darn spam filters!), and then into the e-mail provider itself.? Depending on who’s doing what to whom that day, there may be DNS (Domain Name Service), SMTP (Simple Mail Transfer Protocol), BGP (Border Gateway Protocol – wide-area routing), WAN (Wide-Area Network), or any number of other issues that could crop up.? Our friends at Google are getting quite persnickety about whose e-mail they’ll accept lately!

Holy smokes!? That got ugly fast!? And, that’s a truly trivial example.

What happens if you’re dealing with a cloud-based logging or observability provider?? That may go across another section of the global Internet, in through who-knows-how-many routers, switches, software layers, etc. just to get into their message catcher.? There’s no telling what their database and storage might be, how many servers they pool together, and all the rest.? Trust them!? They’re professionals!? They do this for money!

领英推荐

Latest News Update: Microsoft Faces a Windows Blue…

Wildnet Technologies 7 个月前

NXLog Enterprise Edition 5.6 Is Out, Disappearing…

NXLog 2 年前

Overview of Exchange 2019-Part 3

Amir Reza Shokouh 2 年前

Things change when they want to use their own protocol layer(s), encrypt the traffic, write their own sending or routing code, and so on.? Of course, not every ISV (Independent Software Vendor) fully supports IBM’s AIX.? They never did, but IBM won’t necessarily tell you that.? Maybe you’re sysloggging out to that same kind of (virtualized) Linux box and then going who-knows-where via who-knows-what, maybe getting SSL (Secure Sockets Layer - largely replaced by TLS = Transport Layer Security) inspected, and on and on and on.

Does that make the cloud-based observability provider (whether they’re really providing next-generation observability or not) a bad idea?? Notwithstanding the foregoing (Ha!? Even some legalese for you!), I’d say no.? Getting that log (and trace and event and …) data the heck out of your on-premises or cloud environment and into someone else’s is a good thing when it comes to audit time.? Auditors love to hear “we can’t change or delete it, and it’s not (just) stored in our servers.”? If it’s strongly encrypted at rest (on disk), in transit, etc.; so much the better.

No, my point is that creating a really long tool chain with a ton of links that could bend or break in all kinds of interesting ways may not be enough for you.? Remember that BGP mention above?? Once it leaves your local network, you may or may not have any idea whose networks it flows through to get to its final destination.? Missouri to Virginia via China?? It can happen.

You may lose a ton of graphing, reporting, and friendly querying functionality.? You may keep your own people busy with what feels like make-work at times: pruning logs, managing servers, and all that.? But, you may have priceless data a whole lot closer and easier at hand when things go wrong.? The ultimate in “close” is on the originating server’s disk, but that can be tough to fully secure.

What if you’re a victim of the infamous fiber-seeking backhoe?? Your log data somewhere else may be inaccessible.? The log servers may not get new log data, or your logging data could be competing head-to-head with your business transactions for bandwidth over congested, backup links with less bandwidth and longer network latency.? None of those are really good for using that log data to figure out the problem.

While logging is a decent example, it’s far from the only one.? If you’ve got distributed components, you may or may not need industrial-grade message brokers running on a bunch of parallel servers with massive complexity behind the scenes (for the consumers – your infrastructure folks may want a word at raise and bonus time).? Such products, open-source or otherwise, can bring a ton of functionality where they’re needed, but they’re not always needed.

Simplify.? Shorten tool chains where you can.? In some cases, look for parallel paths and parallel services so Murphy’s Law doesn’t bite as hard.

One more war story, I suppose.? Many a winter moon ago, I was under contract to a large, worldwide organization that was re-working its software and networking layers.? A few years earlier, the simpler, easier, shorter tool (and software) chains on ancient hardware had totally gummed up at around 7-10 customer transactions per second.? Their new machines were somewhere in the neighborhood or 25-40 times faster, some had multiple processors, and they were using all new programming techniques.? However, there were so many hand-offs and message-passes in their new software stack, that those shiny new boxes were rumored to top out about … wait for it … wait for it … 7-10 customer transactions per second.

It doesn’t matter how macro or micro the scale.? Shorter tool chains, fewer hand-offs and message-passes can be a very good thing for your system designs…

要查看或添加评论，请登录

Chris Petersen的更多文章

Theoretical CS or Practical IT?

2025年3月5日

Theoretical CS or Practical IT?

BACKGROUND I was watching a YouTube video the other day about path-finding algorithms, and it got me thinking back to…
Cost-Saving IT Projects

2024年12月18日

Cost-Saving IT Projects

Cost-Saving IT Projects There may not be an IT shop on (or off) the planet that doesn’t want to cut costs in one area…
A World Without Email - Follow-up

2024年11月25日

A World Without Email - Follow-up

This is a follow-up to https://www.linkedin.
Am I Data-Driven?

2024年11月13日

Am I Data-Driven?

I may have driven a former boss a little nuts with my weekly to-do list / status report e-mails during our time…
Social Media Presence(s)

2024年11月13日

Social Media Presence(s)

"They're Petersens, Harry. Clever as they come, Petersens, but not the most social of beasts.

2 条评论
Documentation and Artifact Flows

2024年11月9日

Documentation and Artifact Flows

Backstory Happy Saturday, all! I’ve been working my way through Cal Newport’s book “A World Without Email” lately, and…
Yet Another Unfinished Book Report

2024年10月30日

Yet Another Unfinished Book Report

Bouncing back and forth between these two volumes makes each one a more interesting read. Does the "hyperactive…
Job Titles - What's in a Name?

2024年9月17日

Job Titles - What's in a Name?

This seems to trip up a lot of folks and organizations, especially in IT. Overall, we are a very young industry and are…

5 条评论
War Story: IT Vendor Philosophies and Customer Projects

2024年6月11日

War Story: IT Vendor Philosophies and Customer Projects

Seeing a vendor’s name all over my Twitter/X feed lately brings back some memories of a project and a job long gone…

1 条评论
How To / Not To IT – It’s DNS! It’s Always DNS!

2024年1月15日

How To / Not To IT – It’s DNS! It’s Always DNS!

This is a topic that’s been on my mind on and off for many years and many jobs. In part, that flows from my belief in…

See all articles

How Long is your Tool …… Chain?

Chris Petersen

Do-er of the Difficult, Wizard of Why Not, and Certified IT Curmudgeon

领英推荐

Chris Petersen的更多文章

社区洞察

其他会员也浏览了

Day 20 : Setting up Mythic Server

Elastic Lab (part 6) - Winlogbeat

PXE / Network Booting Raspberry Pi's

Network Throughput Testing with iPerf

How to make your Archlinux faster?

Understanding the Removal of tcp_tw_recycle in RHEL 8

How did CrowdStrike manage to Push the faulty update Globally?

How to use SCP to transfer files between Linux and Raspberry PLC

The Samba Vulnerability: What is CVE-2021-44142 and How to Fix It

Backing up BitLocker recovery keys afterwards in Active Directory with PowerShell

领英推荐

Chris Petersen的更多文章

Theoretical CS or Practical IT?

Cost-Saving IT Projects

A World Without Email - Follow-up

Am I Data-Driven?

Social Media Presence(s)

Documentation and Artifact Flows

Yet Another Unfinished Book Report

Job Titles - What's in a Name?

War Story: IT Vendor Philosophies and Customer Projects

How To / Not To IT – It’s DNS! It’s Always DNS!

社区洞察

其他会员也浏览了

Day 20 : Setting up Mythic Server

Elastic Lab (part 6) - Winlogbeat

PXE / Network Booting Raspberry Pi's

Network Throughput Testing with iPerf

How to make your Archlinux faster?

Understanding the Removal of tcp_tw_recycle in RHEL 8

How did CrowdStrike manage to Push the faulty update Globally?

How to use SCP to transfer files between Linux and Raspberry PLC

The Samba Vulnerability: What is CVE-2021-44142 and How to Fix It

Backing up BitLocker recovery keys afterwards in Active Directory with PowerShell