Logging & It's Best Practices for DevOps
Sandip Das
AWS Container Hero | Founder @ Good Cloud Development | Cloud & DevOps Architect for Startups | Kubernetes Specialist | SRE, Platform Engineering & MLOps Enthusiast | Educator | Mentor
Logging is a critical aspect of DevOps for monitoring, troubleshooting, and maintaining system health.
But Let's first understand what EXACTLY is logging?
Logging:
Logging is the process of recording events, messages, or other information about the operation of a system or application.
This information, often called logs, helps developers, system administrators, and other stakeholders understand what the system is doing, diagnose problems, and monitor the system's performance and behavior.
Now, let's learn about the Types of Logs, Log Levels, Log formats, and famous Log Libraries ??
Types of Logs
Event Logs
Record significant occurrences in the system.
[2024-06-03 08:00:00] INFO: System boot initiated.
[2024-06-03 08:00:10] INFO: System boot completed successfully.
[2024-06-03 20:00:00] INFO: System shutdown initiated by user admin.
[2024-06-03 20:00:05] INFO: All services stopped successfully.
[2024-06-03 20:00:10] INFO: System shutdown completed.
Error Logs
Capture errors and exceptions that occur during runtime.
[2024-06-03 12:00:00] ERROR: Application crash detected: NullPointerException at com.example.MyApp.main(MyApp.java:42).
[2024-06-03 12:05:00] ERROR: Unable to connect to the database. SQLException: Connection refused.
[2024-06-03 12:10:00] ERROR: FileNotFoundException: Config file '/etc/myapp/config.json' not found.
[2024-06-03 13:00:00] ERROR: Disk space critically low on /dev/sda1. Only 5MB available.
Transaction Logs
Track transactions or business operations.
[2024-06-03 09:00:00] INFO: Transaction ID: 1234567890, User: john.doe, Type: Purchase, Amount: $100.00, Status: Success, Payment Method: Credit Card, Timestamp: 2024-06-03 09:00:00
[2024-06-03 09:05:00] ERROR: Transaction ID: 1234567891, User: jane.doe, Type: Withdrawal, Amount: $200.00, Status: Failed, Reason: Insufficient Funds, Timestamp: 2024-06-03 09:05:00
[2024-06-03 09:10:00] INFO: Transaction ID: 1234567892, User: john.doe, Type: Refund, Amount: $50.00, Status: Success, Original Transaction ID: 1234567888, Timestamp: 2024-06-03 09:10:00
[2024-06-03 10:00:00] INFO: Order ID: 9876543210, User: alice.smith, Items: [{"item_id": "1234", "quantity": 2}, {"item_id": "5678", "quantity": 1}], Total Amount: $150.00, Status: Placed, Timestamp: 2024-06-03 10:00:00
[2024-06-03 12:00:00] INFO: Order ID: 9876543210, User: alice.smith, Status: Delivered, Delivery Date: 2024-06-03, Timestamp: 2024-06-03 12:00:00
Audit Logs
Keep track of access and changes to data for security and compliance.
[2024-06-03 09:00:00] INFO: User john.doe logged in. IP: 192.168.1.100, Session ID: abc123def456
[2024-06-03 10:00:00] INFO: User jane.doe changed password. IP: 192.168.1.101
[2024-06-03 11:00:00] INFO: Admin admin1 created user account for alice.smith. Role: User, IP: 192.168.1.102
[2024-06-03 12:00:00] INFO: Admin admin2 changed role for user bob.jones to Admin. IP: 192.168.1.103
[2024-06-03 13:00:00] INFO: Admin admin1 disabled user account for carol.white. IP: 192.168.1.104
[2024-06-03 14:00:00] INFO: User john.doe accessed file /documents/report.pdf. IP: 192.168.1.100
Log Levels
Trace
Fine-grained informational events, typically only valuable during development.
{"type":"Error","message":"no space available for write operations","stack":"Error: no space available for write operations\n at Object.<anonymous> (/home/ayo/dev/betterstack/demo/nodejs-logging/index.js:21:3)\n at Module._compile (node:internal/modules/cjs/loader:1254:14)\n at Module._extensions..js (node:internal/modules/cjs/loader:1308:10)\n at Module.load (node:internal/modules/cjs/loader:1117:32)\n at Module._load (node:internal/modules/cjs/loader:958:12)\n at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)\n at node:internal/main/run_main_module:23:47"},"msg":"Disk space critically low"}
Debug
Detailed information used to diagnose issues.
[2024-06-03 10:00:00] DEBUG: Starting process to handle user profile request. UserId: 12345
[2024-06-03 10:00:08] DEBUG: Response payload assembled. UserId: 12345, Payload: { "userName": "John Doe", "email": "[email protected]", "orders": [...] }
Info
Informational messages that highlight the progress of the application.
[2024-06-03 10:00:00] INFO: Received request for user profile. UserId: 12345
Warn
Indicate potential problems or non-critical issues.
[2024-06-03 10:00:01] WARN: Deprecated API endpoint accessed. Endpoint: /v1/user/profile, UserId: 12345
[2024-06-03 10:00:03] WARN: High memory usage detected. CurrentUsage: 85%, Threshold: 80%
[2024-06-03 10:00:04] WARN: Fallback to secondary data source due to primary source failure. DataSource: UserCache, UserId: 12345
Error
Capture error events that might still allow the application to continue running.
[2024-06-03 10:00:00] ERROR: Database connection failure. Database: UserDB, Reason: Connection timed out, UserId: 12345
[2024-06-03 10:00:01] ERROR: Failed to authenticate user. UserId: 12345, Reason: Invalid credentials, IP: 192.168.1.100
[2024-06-03 10:00:02] ERROR: Unable to fetch user details. UserId: 12345, Service: UserService, Reason: Service unavailable
[2024-06-03 10:00:03] ERROR: Exception thrown during order retrieval. UserId: 12345, Exception: NullPointerException at OrderService.getOrder(OrderService.java:45)
[2024-06-03 10:00:04] ERROR: Payment processing failed. TransactionId: 987654321, UserId: 12345, Reason: Insufficient funds
[2024-06-03 10:00:05] ERROR: Data integrity violation. Entity: UserProfile, UserId: 12345, Reason: Duplicate entry for key 'email'
[2024-06-03 10:00:06] ERROR: Unauthorized access attempt. UserId: 12345, Endpoint: /admin/settings, IP: 192.168.1.101
[2024-06-03 10:00:07] ERROR: File upload failed. UserId: 12345, Filename: profile.jpg, Reason: File size exceeds limit
[2024-06-03 10:00:08] ERROR: System out of memory. Action: Saving user session, UserId: 12345, AvailableMemory: 50MB
[2024-06-03 10:00:09] ERROR: Critical configuration missing. ConfigKey: smtp_server, UserId: 12345, Service: EmailService
Fatal
Severe error events that lead to the termination of the application.
[2024-06-03 10:00:00] FATAL: Critical system failure. Reason: Out of memory. Application terminating.
[2024-06-03 10:00:01] FATAL: Unhandled exception in main application thread. Exception: java.lang.OutOfMemoryError: Java heap space
[2024-06-03 10:00:02] FATAL: Database corruption detected. Database: UserDB, Reason: Inconsistent state in critical tables.
[2024-06-03 10:00:03] FATAL: Failed to initialize core services. Service: AuthService, Reason: Configuration file missing.
[2024-06-03 10:00:04] FATAL: Security breach detected. Immediate shutdown initiated to protect data integrity.
[2024-06-03 10:00:05] FATAL: Hardware failure detected. Component: Disk 1, Server: ProdServer01, Action: System halt.
[2024-06-03 10:00:06] FATAL: Irrecoverable error in transaction processing. Transaction ID: 987654321, Reason: Null pointer dereference.
[2024-06-03 10:00:07] FATAL: System integrity compromised. Reason: Unauthorized modification of core files detected.
[2024-06-03 10:00:08] FATAL: Critical configuration missing. ConfigKey: database_url, Service: MainApp, Action: Shutting down application.
[2024-06-03 10:00:09] FATAL: Catastrophic failure. Reason: Kernel panic, Server: ProdServer01, Action: Immediate reboot.
Log Formats
Plain Text
Simple and human-readable, but lacks structure.
[2024-06-03 10:00:00] INFO: User john.doe logged in. IP: 192.168.1.100
[2024-06-03 10:00:01] WARN: Slow response from Auth Service. UserId: 12345, ResponseTime: 1200ms
[2024-06-03 10:00:02] ERROR: Failed to authenticate user. UserId: 12345, Reason: Invalid credentials, IP: 192.168.1.100
[2024-06-03 10:00:03] FATAL: Critical system failure. Reason: Out of memory. Application terminating.
Structured Logs
Use JSON, XML, or similar formats to ensure logs are machine-readable and easily searchable.
JSON Example:
[
{
"timestamp": "2024-06-03T10:00:02Z",
"level": "ERROR",
"message": "Failed to authenticate user",
"userId": 12345,
"reason": "Invalid credentials",
"ip": "192.168.1.100"
},
{
"timestamp": "2024-06-03T10:00:03Z",
"level": "FATAL",
"message": "Critical system failure",
"reason": "Out of memory",
"action": "Application terminating"
}
]
XML Example:
<logs>
<log>
<timestamp>2024-06-03T10:00:02Z</timestamp>
<level>ERROR</level>
<message>Failed to authenticate user</message>
<userId>12345</userId>
<reason>Invalid credentials</reason>
<ip>192.168.1.100</ip>
</log>
<log>
<timestamp>2024-06-03T10:00:03Z</timestamp>
<level>FATAL</level>
<message>Critical system failure</message>
<reason>Out of memory</reason>
<action>Application terminating</action>
</log>
</logs>
Binary Logs
These are more efficient but harder to interpret without specific tools.
00000000 5b 32 30 32 34 2d 30 36 2d 30 33 54 31 30 3a 30 |[2024-06-03T10:0|
00000010 30 3a 30 30 5a 5d 20 49 4e 46 4f 3a 20 55 73 65 |0:00Z] INFO: Use|
00000020 72 20 6a 6f 68 6e 2e 64 6f 65 20 6c 6f 67 67 65 |r john.doe logge|
00000030 64 20 69 6e 2e 20 49 50 3a 20 31 39 32 2e 31 36 |d in. IP: 192.16|
000000c0 20 45 52 52 4f 52 3a 20 46 61 69 6c 65 64 20 74 | ERROR: Failed t|
00000120 30 30 0a 5b 32 30 32 34 2d 30 36 2d 30 33 54 31 |00.[2024-06-03T1|
00000130 30 3a 30 30 3a 30 33 5a 5d 20 46 41 54 41 4c 3a |0:00:03Z] FATAL:|
00000140 20 43 72 69 74 69 63 61 6c 20 73 79 73 74 65 6d | Critical system|
00000150 20 66 61 69 6c 75 72 65 2e 20 52 65 61 73 6f 6e | failure. Reason|
00000160 3a 20 4f 75 74 20 6f 66 20 6d 65 6d 6f 72 79 2e |: Out of memory.|
00000170 20 41 70 70 6c 69 63 61 74 69 6f 6e 20 74 65 72 | Application ter|
Logging Frameworks and Libraries
Here are some Logging best practices with DevOps in mind:
WARNING: EXCESSIVE logging might overload disk space or DB or any storage wherever you are storing it and also can significantly increase the cost, so make sure to store only what is required, and auto-clean logs that are not required after a certain time.
Pro tip: Use Time to Live (TTL) if available
Bonus section (since you have read till now)
Container Logging: In Kubernetes environments, container logging is crucial. Use logging agents like Fluentd, Fluent Bit, or Filebeat deployed as DaemonSets to collect logs from each node. Ensure logs from all containers are aggregated, tagged with metadata (e.g., pod name, namespace), and forwarded to a centralized logging system for analysis. Implement log rotation and retention policies at the container level to prevent log files from consuming excessive disk space.
Ready to take your logging to the next level? Start implementing these best practices today to enhance your monitoring and troubleshooting capabilities.
Share your experiences and insights in the comments below!
If you found this article helpful, don't forget to like and share it with your network.
Let's drive better DevOps practices together!
Cheers,
--
5 个月Hello
Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan
5 个月Keep going!.
Seeking System Administrator | 2 Azure Certified | Skilled in Docker, Kubernetes, Jenkins, Azure DevOps | Tech Content Creator
5 个月Insightful! Sandip Das Keep going on this
Senior Cloud Architect | DevSecOps | 3X AWS Certified |Kubernetes l AWS Cloud Migration | Tech Lead at Umbrella Infocare
5 个月Very true logging is Very important in production environment
Experienced DevOps & Platform Engineering Expert | 10+ Years of Innovation & Operational Excellence in Platform Engineering | Proven Problem Solver & Results-Driven Professional
5 个月Great one, thank you so much ??