Chaos Engineering – Storage Saturation

Chaos Engineering – Storage Saturation

Production environments often run out of space due to growing disk usage by application files such as logs, databases, output files, etc. These files are space-consuming, and if not monitored and managed, they can cause applications to crash.

In this series of chaos engineering articles, we have been learning to simulate various performance problems. In this article let’s discuss how to simulate disk storage issues.

Sample Program

Here is a sample program from open source BuggyApp application, which would fill up your disk storage:

public static void fillDiskSpace(String drive, Integer percentageFill) {
   	 String filePath = drive + "/tier1testfile";
   	 // Create a File object with the specified file path
   	 deleteFile(drive);
   	 // Check available space on the drive
   	 long availableSpaceInBytes = getAvailableSpace(drive);
   	 long fileSizeInBytes = (long) ((availableSpaceInBytes * percentageFill) / 100);
   	 byte[] data = generateRandomData(ONE_MB);
   	 Integer noOfInterations = (int) fileSizeInBytes / ONE_MB;
   	 for (Integer iteration = 0; iteration < noOfInterations + 1; iteration++) {
   		 try (FileOutputStream fos = new FileOutputStream(filePath, true)) {
   			 fos.write(data);

   		 } catch (IOException e) {
   			 System.out.println("Error " + e.getMessage());
   			 e.printStackTrace();
   		 }
   	 }
   	 System.out.println("File created successfully occupying " + percentageFill + "%");
}
        

The above code demonstrates the ‘fillDiskSpace’ function, which adds new test files ‘tier1testfile’ to the specified drive (as an argument) until it reaches the specified percentage (as arguments) of filled disk space. The program first checks the available space using ‘getAvailableSpace(drive)’ and then fills the drive with random data of 1 MB in iterations until the specified percentage is reached.

Here is the command to buggy app service to simulate this problem:

java -jar buggyApp.jar FILL_DISK_SPACE <drive path> <percentage fill>

When we launch this BuggyApp simulation on a windows machine, you can notice that the specified drive has created a file with remaining available disk space, which is occupied by 90% of storage.?


Let’s assume the drive has 1.94 GB of storage space before launching this BuggyApp simulation. Now, after launching the BuggyApp simulation, your disk space gets filled with a 1.74 GB test file. You can notice that the space is filled to 90%.

How To Diagnose Storage Saturation?

Monitoring disk space is critical for proper service management to avoid application failures caused by insufficient disk space. Let’s discuss the two approaches followed to diagnose the Storage Saturation problem:?

1. Manual approach:

The manual approach is to use system commands to check the disk space at regular intervals. The Get-Volume command on Windows or df on Unix/Linux environments provides the drive space along with the remaining size.


2. Automated approach:

Manually checking the space continuously is not a scalable solution. The automated approach is advisable, which will monitor the disk space automatically. You can use monitoring tools such as the yCrash monitoring tool. yCrash is capable of predicting outages before they surface in the production environment.?

Once it predicts an outage in the environment, it captures 360° troubleshooting artifacts from your environment, analyzes them, and instantly generates a root cause analysis report.?

Below, yCrash is shown raising notifications for disk usage issues, with a detailed view of the drive causing space problems.


This image above shows how yCrash highlights the Disk Storage issue captured in On-demand mode without application issues.


This image above shows the Disk Storage issue captured by m3 mode without application issues using yCrash.?

Conclusion

In this article, we discussed how to fill a disk space by using BuggyApp simulation. We have looked at a sample program that can create a test file occupying 90% of your available disk storage, which can lead to a storage issue. We also discussed finding the storage issue in two approaches such that Manual and Automate.


要查看或添加评论,请登录

yCrash的更多文章

社区洞察

其他会员也浏览了