4th Gen Intel? Xeon? Scalable processor is here. Let’s talk about reliability at scale.
The 4th Gen Intel Xeon processor (formerly codenamed Sapphire Rapids) is a big generational leap in technology packed with an array of powerful new CPU cores with built-in acceleration for critical workloads. While these leaps in performance are always going to be the headline, I want to dive a little deeper into what many datacenter operators care about even more than performance: reliability.?
?
Our newest processor is designed and manufactured to be a rock-solid foundation for even the largest scale computing applications with the toughest reliability requirements. In developing this processor, we focused on three critical improvement areas: memory subsystem reliability, system stability, and in-fleet management capabilities.?
?
With the introduction of DDR5 technology, we fully updated our Memory Reliability/Availability/Serviceability (RAS) architecture. With a new higher performance Error Correction Code (ECC) capability along with new RAS features like Permanent Fault Detection (PFD) we can tackle memory errors significantly better than in past platforms. As important, we engineered the memory interface to enable 75 fully validated memory configurations including 2 DIMM/channel operation enabling more cost-effective, larger capacity configurability. Intel’s industry leadership in extensive margin testing, customer enablement tools, and partnerships with leading DRAM vendors ensure an exacting standard for memory reliability.?
?
领英推荐
System stability goals for the design were dramatically raised vs prior generations. We demonstrated a whopping 200,000 resets without a single failure at our internal scale testing facility prior to production release. The new 4th Gen Intel Xeon Scalable processors also mark the debut of a next generation manufacturing test platform that increases initial silicon quality, which can be especially important in large-scale installations.?
?
Finally, we provided new ways for the reliability of the fleet to be managed for months and years after installation. This includes advanced in-field testing capabilities to maintain reliability over time with minimal service interruptions. With this processor we piloted the newly developed Intel Platform Monitoring Technology (Intel PMT) in our internal scale testing facility to collect gigabytes of telemetry data throughout the validation. Intel PMT is a newly developed telemetry framework for both in-band and out-band manageability. The platform also allows most typical firmware updates to be delivered without a system reset. We also updated debug tools to diagnose and resolve issues remotely, even rare or sporadic problems that exist at the statistical margins of a large fleet.?
?
I’m thrilled with the results of the 4th Gen Intel Xeon Scalable processor design. I believe we have built the highest quality, most reliable datacenter platform in our history.?
Global Mobility Manager @ Intel Corporation | Certified Public Accountant
2 年Following
IT Support Technician | Helpdesk Support | Technical Issue Resolution | System Maintenance, Updates & EOL | Administrative Activities |
2 年That data center is a master piece. Intel did a great job building that data center and I bet that will have a postive impact to many computer users. Great job!
Executive Assistant
2 年Just wow!
Sr. Director of Customer Assurance and Production Execution
2 年Great Blog - reliability is so important, especially at scale!
Global Communications | Intel - Pegasus Tech Ventures - Y&R - Vanderbilt
2 年Nice piece!