Too much data killed the server
Half the Maximo system is dead … "Why is Maximo so fragile?"
As this call shows Maximo wasn't fragile – it was just doing what it was told…
The call started with a super user calling to say that their session was slow and did we know why it could be slow?
It was followed by a Websphere administrator reporting that a server was running with high CPU.
The monitoring system reported that the server had almost exhausted its disk space.
Then users on a different JVM on the same server reported that their sessions were just displaying the long op (spinning blue wheel).
Then the server failed to respond to SSH connection requests.
So what had caused a perfectly healthy and stable server to get in this state?
The usual checks showed:
?The first clue was the alert that the server was running out of disk space. Working with a Linux administrator revealed that the disk space was being used up by a ".goal" file which was over 1GB in size.
The goal file was in the tmp directory under a folder starting with the name DataEngine.
The tail command showed that the file contained XML tags and what appeared to be details of workorders.
Studying the logs revealed that a BIRT report had been started. The report logger was set to record details of the request. The details showed that the where clause would have returned several years of workorders.
The user said that they had built their own mini-database and that they were importing the data into it so they could use it for their work.
The size of the export had caused the JVMs to fail to write to its temporary files and the log files. The JVMs had been generating high CPU as they tried to write to the file system and then handle the errors that were generated.
领英推荐
What was the goal file?
The goal file was a temporary file that the BIRT report engine created to store the data to be exported. The file was so large because the export format was writing tags for the different cells.
?Vetasi advice
?There were several lessons to be learnt here:
Users should be encouraged to discuss their requirements – Once the requirements were understood the relevant calculations were built into the reporting database and a new report was built to show the desired data that had been displayed in the mini-database. The report was extended so it included extra features that the user hadn't thought of and it was scheduled so that it delivered the results before they arrived at work.
Additional monitoring was added to check specific folders where large files were generated. The new monitoring warned when a large goal file was being generated after a different user made a mistake when they started the report.
Additional functionality was built into key reports to prevent the report running when the where clause was likely to return large numbers of rows
Support staff were given additional training about where large files could be generated
The system was reconfigured so that large requests could not exhaust the disk space available for the JVMs
An RFE was raised to encourage IBM to investigate providing other exporters – Additional exporters were provided in a later version.
This blog series
This article is one of a series of articles to help system administrators understand the Maximo logs and the underlying architecture.
If you like this article then please share or like it.
Whilst I support the wider Maximo community and encourage the spread of knowledge, when republishing content from this blog please include the originating author along with the article or parts of.
If people do find parts of this blog coming up in blogs/newsletters/communications then please contact me directly. I’m happy to connect on LinkedIn to discuss.
Disclaimer
The postings on this blog are my own and don't necessarily represent Vetasi's positions, strategies or opinions.
The materials on this site are provided "AS IS" and the author will not be liable for any direct, indirect or incidental damages arising out or relating to any use or distribution of them. Readers are advised to test any changes/recommendations thoroughly before use
Technical Design Authority / IBM Champion for Cloud
3 年In my next article I explain how a user's mistake when pasting in fromatted text caused serious problems - https://www.dhirubhai.net/pulse/report-showing-garbage-impact-rich-text-formatted-mark-robbins/
Retired Business Consultant
3 年Another great article on possible issues with Maximo changes. You always need to understand the system you are making changes to and have the documentation on what has been configured. Too many companies rely on a single person or a small group to “know” how it all works and be able to give this analysis quickly. As many of us will attest, this is rarely the best way to manage things.