Water everywhere, don't forget to drink
Imagine a person in a desert, tired, hungry and very thirsty. Now imagine that person is so focused on walking towards their destination that they ignore water, which is readily at hand. It would be hard to justify this as reasonable behavior for the circumstances and yet we frequently observe similar behavior in the IT arena, where teams are so busy doing “things” that they don’t take the time to leverage the IT equivalent of water, “data”. The data based intelligence that can keep their IT environments in optimal condition and ready to respond to problems that may arise with the Confidentiality, Integrity or Availability (CIA) of their systems. There are a variety of ways that one could look at this situation and numerous specific examples could be put forth. I will focus here on the relatively straightforward example surrounding event logging.
Event logging is a foundational component of virtually all IT systems. It is commonplace in IT components, for system events to be logged by default or to be activated with a simple configuration change. It is even trivial in many cases to configure event messages to be sent to a centralized collector. Numerous product and opensource solutions are available to be used as such centralized event collectors and intelligence platforms (I.e IBM QRadar, Elastic.co stack, LogDNA). Setting up the collection of logs is only the first part in the value chain. The availability of historical log data satisfies a compliance requirement that many organizations have and also serves a crucial role in investigation and troubleshooting. The next step in the value chain for system event logging is the post event log processing. This step applies analysis, alerting and reporting to the wealth of log data being collected. This can be done with a range of approaches from simple scripting to Machine Learning (ML). Event collector and intelligence platforms, such as those mentioned above, already have built in capabilities that generally satisfy the needs of most organizations. Of course, additional correlation and analysis could be brought to bear on the available log event data. The alerting and reporting features can range from immediate alerts in email, Slack, or your own favorite communication platforms, to reports and dashboards that are available within the User Interface (UI) of various solutions. You will likely find, as we have seen, that different support individuals have their preferences for how they wish to receive such intelligence on their respective systems and services. For example, Person “A” might like embedded PDF report files in email, while Person “B” simply goes to their customized dashboard one or more times per day.
Just like the lack of water can cause health problems, having too much water can also be a problem. Nothing to excess, except moderation. Likewise, in the IT arena, the deluge of logging data can also be a significant problem. IT teams are often inundated with so much data and so many communications that it can be difficult to focus on what is really relevant at any point in time. This is why it is so important to have logging, analysis and reporting, platforms and services, which are able to consume, properly filter and route relevant information to the appropriate recipients, in a useful timeframe.
So let us now say, that your organization has reached the point where key systems and services are logging their event data and the requisite analysis, alerting and reporting is in place. The "water" (data) is available and it is even "filtered", “tasty” information that is ready to satisfy the IT "thirst". However, if the people responsible for the organization's systems and services are in essence, ignoring this intelligence, this required "sustenance" for IT, they are risking the overall health of the IT environment. This is not just from the standpoint of security and compliance. This is also from a feature/function position, including availability and performance of the IT systems and services. System events can often allow for early warning of degradation, where proactive efforts in response to alerts and reports, can address problems before they become a more serious issue for stakeholders. For example, it was easy to see in a recent summarized report that a device was experiencing hardware memory issues and the responsible team was able to address this in a planned manner, before a serious unplanned outage was experienced. An example, from a security perspective could be the awareness of targeted attacks from a particular Internet system, or group of systems. Based on this awareness the support teams can implement tactical changes to restrict such traffic to further mitigate the potential for a compromise.
The moral of the story, if you will, is to not be so focused or busy on one aspect of IT that you ignore the oasis of information that will keep your environment in good shape, better enabling you to meet all of your IT and business objectives.
Retired IT/ICT Expert with over 40 years of Experience - Specializing in Data Center Industry, and Airlines Accident and Safety Historian.
3 年Hey Bill!! I hope all is Safe and Well. But is all water safe to drink? Just like, is Internet Safe? Any how, Have a Nice and Holiday Season and the New Year! Hopefully, 2021 - we should have the Corona mess under Control. Tom