Cisco UCS - How to reset B200 Memory Errors
Here is some tedious but necessary steps that need to take place when you encounter a memory DIMM with multiple ECC errors, otherwise Cisco will request you to do it anyway which will waste some time if you want to get that replacement part delivered immediately.
- Login to the UCS command line interface (Putty preferred)
- Reset memory errors using the commands below:
- CLI# scope server x/y (x = chassis number, y = slot number)
- CLI# reset-all-memory-errors
- CLI# commit-buffer
- CLI# clear sel
- CLI# commit-buffer
- CLI# scope cimc
- CLI# reset
- CLI# commit-buffer
Once all these commands are executed, keep an eye out for the errors to come back.
If they do not appear for 24-48 hours (I have never experienced this), then you are clear and the DIMM does not need to be replaced, otherwise collect the logs again and provide to Cisco support to expedite the process.
Have you ever had a DIMM with ECC errors fix itself after clearing them? Comment below!
Advanced Systems Engineer at Lee Health
3 年Yah! It's like a flat spontaneously re-inflating..
Senior SE @ VAST
4 年Are you leveraging Cisco Intersight as well?? It covers UCS Memory Failures (DIMM Inoperable Fault F0185) as part of its initial scope. https://www.cisco.com/c/en/us/support/docs/servers-unified-computing/intersight/215172-proactive-rma-for-intersight-connected-d.html
VMware Engineering Management | Infrastructure Management | Systems Engineering
4 年Great write up Thanks