登录查看更多内容

What to log

Marcel Koert

Innovative Platform Engineer | DevOps Engineer | Site Reliability Engineer | IT Educator | Founder of Melomar-IT

发布日期: 2020年10月23日

Quick over view.

All Applications that you write should have good logging. But what is good logging? Let’s start with a few No Brainers. Your applications uptime is always more important than your logging. If the data that you want to write is more important than it should not be in a log but in a database of some sort. Logging is only transient data.

We can define 3 types of logging:

Application log: All application failures, start , stop's with their reasons. (See log levels)
Audit logs: All functional actions done on behave of consumers, with the following data who, when and what.
Access logs: HTTP based log what ip connected to what end point at what time and with what code was answered.

LogTypes

Audit

Why Log Audit Logging ? Inevitably, someone asks why event data should be logged on a given system. Essentially there are four categories of reasons:

Accountability – Log data can identify what accounts are associated with certain events. This information then can be used to highlight where training and/or disciplinary actions are needed.
Reconstruction – Log data can be reviewed chronologically to determine what was happening both before and during an event. For this to happen, the accuracy and coordination of system clocks are critical. To accurately trace activity, clocks need to be regularly synchronized to a central source to ensure that the date/time stamps are in synch.
Intrusion Detection – Unusual or unauthorized events can be detected through the review of log data, assuming that the correct data is being logged and reviewed. The definition of what constitutes unusual activity varies, but can include failed login attempts, login attempts outside of designated schedules, locked accounts, port sweeps, network activity levels, memory utilization, key file/data access, etc.
Problem Detection– In the same way that log data can be used to identify security events, it can be used to identify problems that need to be addressed. For example, investigating causal factors of failed jobs, resource utilization, trending and so on.

What to Log? Essentially, for each system monitored and likely event condition there must be enough data logged for determinations to be made. At a minimum, you need to be able to answer the standard who, what and when questions.

Was an Authentication request a success or a failure
Was an Authorization request a success and failure

Both of these questions can be asked for a individual or in more general terms of how many did failed.

Retention Period of Audit logging. What is a normal time to keep audit logs , I think 90 days should be enough for problems to be noticed and researched, but every industry has their own needs so it could be a lot longer.

Application

Which events to log

The level and content of security monitoring, alerting and reporting needs to be set during the requirements and design stage of projects, and should be proportionate to the information security risks. This can then be used to define what should be logged. There is no one size fits all solution, and a blind checklist approach can lead to unnecessary "alarm fog" that means real problems go undetected. Where possible, always log:

Input validation failures e.g. protocol violations, unacceptable encodings, invalid parameter names and values
Output validation failures e.g. database record set mismatch, invalid data encoding
Session management failures e.g. cookie session identification value modification
Application errors and system events e.g. syntax and runtime errors, connectivity problems, performance issues, third party service error messages, file system errors, file upload virus detection, configuration changes
Application and related systems start-ups and shut-downs, and logging initialization (starting, stopping or pausing)
Use of higher-risk functionality e.g. network connections, addition or deletion of users, changes to privileges, assigning users to tokens, adding or deleting tokens, use of systems administrative privileges, access by application administrators, all actions by users with administrative privileges, access to payment cardholder data, use of data encrypting keys, key changes, creation and deletion of system-level objects, data import and export including screen-based reports, submission of user-generated content - especially file uploads

Optionally consider if the following events can be logged and whether it is desirable information:

Sequencing failure
Excessive use
Data changes
Fraud and other criminal activities
Suspicious, unacceptable or unexpected behaviour
Modifications to configuration
Application code file and/or memory changes

log levels

INFO

All important information that we need for normal operations.
All interesting global information on performance and trends. This logging should be minimal. Typical information that is interesting: Application start-up and shutdown information
User logging on, user logging off
Pages (requestUri's) being accessed
Performance of service calls, indicating if the result was retrieved from a cache.

WARN

All information about things that are going wrong but do not need intervention from humans. Something is not as it should be, but everything does work. For example, in case fall back content is send back.

ERROR

All information about things that are going wrong that do need human intervention. One or several sessions (users) are impacted. For example, a service call timed out or a page cannot be found.

DEBUG

All actions done by humans when using the application. Logging that should help a administrator to determine the cause of an error. All logging should be understandable and relevant for administrators.

TRACE

All actions done by the application. All other logging, which should be understandable and relevant for developers. Examples are method entries and exits, results returned from services and databases. This is the only level at which stack traces are allowed. A stack trace at any other level is a program error.

FATAL

Everyone is affected, the entire application is not working. For example, application properties are not present.

Retention What is a normal time to keep Application logs: 90 Days online & 1 year offline for all technical logs, but every industry has their own needs so it could be a lot longer.

Access log

An access log is a list of all the requests for individual files/endpoints that have been requested from a API. The access logs can offer a great deal of information regarding the incoming requests to your API If you need to analyse these logs in large amounts then it may be beneficial to use a log analysis tool that can “crunch the numbers” for you much faster. Example: 127.0.0.1 - peter [9/Feb/2017:10:34:12 -0700] "GET /sample-image.png HTTP/2" 200 1479

Retention

What is a normal time to keep Application logs: 90 Days online & 1 year offline for all technical logs, but every industry has their own needs so it could be a lot longer.

Conclusion

Whatever you do think about it in the design phase, do not make logging a after the fact exercise. Logging is too important to not think about. And to not push for as a Dev or OPS engineer you need to know what you application is doing so when that time comes and it does something you did not expect you can look in the logs and say that is why it went wrong, not mmmh i do not see anything.

Anas Anjaria

Backend Engineering | Performance Optimization | Scalable Systems

4 年

Thanks for sharing. Regarding log levels, I find this article (https://tuhrig.de/my-logging-best-practices/) very interesting.

要查看或添加评论，请登录

Marcel Koert的更多文章

Paying for views/advertisement for your youtube channel is that bad.

2025年2月12日

Paying for views/advertisement for your youtube channel is that bad.

The Debate Over Paid Views and Advertising on YouTube: A Balanced Perspective YouTube is an ever-expanding universe of…
Emphasizing Developer Experience in DevOps

2025年1月30日

Emphasizing Developer Experience in DevOps

In the realm of DevOps, the focus has traditionally been on streamlining processes, automating workflows, and enhancing…
Rise of Internal Developer Platforms

2025年1月29日

Rise of Internal Developer Platforms

The Rise of Internal Developer Platforms: A Comprehensive Guide for DevOps Engineers In the dynamic realm of software…
The Hype About Platform Engineering: Echoes of the SRE Revolution

2025年1月27日

The Hype About Platform Engineering: Echoes of the SRE Revolution

In the world of modern software development, buzzwords come and go, but some stick long enough to redefine the way we…
Openshift V Kubernetes

2025年1月23日

Openshift V Kubernetes

OpenShift and Kubernetes are both popular container orchestration platforms used in the deployment and management of…
Human biases in SRE

2025年1月22日

Human biases in SRE

Human biases can have a negative impact on reliability in an IT organisation by influencing decision-making…
The Devaluation of SRE

2025年1月21日

The Devaluation of SRE

The Devaluation of SRE: When Operations Gets a New Label In recent years, Site Reliability Engineering (SRE) has…

9 条评论
Building reliability

2025年1月21日

Building reliability

Building reliability into a microservices environment requires a comprehensive approach that encompasses various…

1 条评论
Certification V Experience

2025年1月20日

Certification V Experience

The debate between certification and experience revolves around the question of what holds more value in the…
SLO, SLI & SLA in SRE

2025年1月17日

SLO, SLI & SLA in SRE

In Site Reliability Engineering (SRE), Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service…

See all articles

What to log

Marcel Koert

Innovative Platform Engineer | DevOps Engineer | Site Reliability Engineer | IT Educator | Founder of Melomar-IT

Quick over view.

LogTypes

Audit

Application

Access log

Marcel Koert的更多文章

社区洞察

其他会员也浏览了

Keep Sensitive Data Out of Your Logs: 9 Best Practices

The Importance of Best Practices for Reliable Data Backups

How To Recover Data From SD Card Not Detected - A Comprehensive Guide

Are you sure your passwords protected? The Bitwarden project check

Insight Jam Newsletter: 4/19/2024

What is Hard Drive Recovery & When is it Necessary

How to Secure Sensitive Configuration Data: Best Practices You Can’t Ignore

Introducing Baffle's Latest User Interface, Streamlining Application Data Security

Day 3: Secure Data Management with Digital Vault and File Encryption for Government Contractors

Delphix Partners with DevOps1 in Australia to Deliver Data Protection Solutions and Ensure Compliance.

Quick over view.

LogTypes

Audit

Application

Access log

Marcel Koert的更多文章

Paying for views/advertisement for your youtube channel is that bad.

Emphasizing Developer Experience in DevOps

Rise of Internal Developer Platforms

The Hype About Platform Engineering: Echoes of the SRE Revolution

Openshift V Kubernetes

Human biases in SRE

The Devaluation of SRE

Building reliability

Certification V Experience

SLO, SLI & SLA in SRE

社区洞察

其他会员也浏览了

Keep Sensitive Data Out of Your Logs: 9 Best Practices

The Importance of Best Practices for Reliable Data Backups

How To Recover Data From SD Card Not Detected - A Comprehensive Guide

Are you sure your passwords protected? The Bitwarden project check

Insight Jam Newsletter: 4/19/2024

What is Hard Drive Recovery & When is it Necessary

How to Secure Sensitive Configuration Data: Best Practices You Can’t Ignore

Introducing Baffle's Latest User Interface, Streamlining Application Data Security

Day 3: Secure Data Management with Digital Vault and File Encryption for Government Contractors

Delphix Partners with DevOps1 in Australia to Deliver Data Protection Solutions and Ensure Compliance.