ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

The Case Against Memory Upgradeability

Joe Chang

å‘å¸ƒæ—¥æœŸ: 2021å¹´7æœˆ2æ—¥

The picture above is a pair of 16GB DIMMs. There are four packages of two 16Gbit die on each DIMM. Consider that most client systems can run fine at 16GB total memory and almost all requirements are met at 32GB. The question: why is memory not in the processor package? Doing so would give up a memory upgrade path. The processor and memory combination must be set at the time of initial purchase. Why give up a feature that we already have? Of course, brain-dead pundits incapable of whole picture, real world thinking would complain loudly, as they have always done so. The fact is: an intelligent and practical decision on the processor and memory combination can be made at purchase that will meet requirements through the systems productive life time for all but a very few cases. The imposition of memory upgradeability has a cost that is not commonly understood. The driving reason to bring memory into the processor package is to remove obstructions to critical performance gains posed by memory externally connected.?

For more than a decade, we have not had anything close to 40% year-on-year improvement either at the core or processor level, except for very few applications that leverage the new SSE/AVX SIMD instructions in each generation. One major reason for this is the huge disparity between the processor core clock cycle and the round trip memory access time. The clock cycle time of a core at 4.0GHz is 0.25ns versus round trip memory access of 70ns, a difference factor of 280X. Any software that does not run inside the on-die cache or rely on streaming memory access will hit a hard wall in performance on memory latency.

Both the processor and DIMM signal pads seem small at less than 1 milli-meter, but this is around 10,000 times greater than the transistor linear dimensions in the low tens of nanometers. Note the semiconductor manufacturing process name of 7nm really just means transistor density is supposed to be double (1.6X after accounting for non-scaling elements) that of the previous 10nm process. The correlation to transistor gate length was given up long ago.

The image below shows the metal levels above the silicon die for one of Intelâ€™s 14nm processes. The metal traces at the lowest level connecting transistors are around 50nm wide. The upper level signals traces are around 10,000nm or 10Î¼m (micrometer or micron) wide, much larger than the first level, though still smaller than the signals that leave the chip.

The diagram below shows the silicon die in a package. The die reside on a substrate. Signals travel from the die to the substrate before leaving the package to external world.?

The image below has additional details showing differences in signal connections from silicon to the substrate and the outside (not shown) versus connections from one silicon die via a silicon bridge to a second silicon die.

To send a signal off first the silicon die, and then off the substrate, exiting of the package to outside world, it must go through a series of buffering circuits (on the silicon die) to greatly amplify the current (ampere) strength at the source transistors (10 nanometer scale), to the millimeter scale of circuit board wires and connectors. This has cost in die area as one of the non-scaling elements. The higher amperage of the external signals impacts power consumption. Buffering adds signal propagation delay(?) and latency is critical.

In moving memory from its location outside the processor package to inside, there is some reduction in the length and size of the wires between the processor and DRAM. If we use existing components designed for off-package signals, the bumps at the silicon die are currently around 100Î¼m. Also, the silicon is designed (and set?) to raise current to a level sufficient for external signaling. If components were designed for in-package connections, the bumps could be reduced to 50Î¼m (perhaps much more in future designs?) and would operate at lower current? Intel and others have already done this with HBM and FPGAs in certain products. Apple puts the DRAM package on their M1 processor package??

The current Intel desktop processor LGA1200 package dimensions are 37.5mm x 37.5 mm. The next generation LGA 1700 package will be 37.5 x 45 mm. DRAM vendors do not like say what their die size is, but the DRAM package is 9 or 10 mm x 11 mm. The 16Gbit DRAM die should be smaller? Also the package size changes with each incremental process density gain. A consolidated processor + DRAM package could fit in comparable dimensions. Though an actual layout would want to optimize the memory controller to DRAM path? ?

é¢†è‹±æŽ¨è

Ralfâ€™s GaN & SiC News (December 14, 2023)

Ralf Higgelke 1 å¹´å‰

This new memory technology is challenging SRAM

AKEN Cheung å°è£…åŸºæ¿åˆ¶é€ å•† 3 å¹´å‰

The Next New Memories

AKEN Cheung å°è£…åŸºæ¿åˆ¶é€ å•† 4 å¹´å‰

Intel client processors largely fall into four major groups. The Celeron and Pentium lines have prices below $100, the Core i3 in the low 100s, the Core i5 in the $150-250 range, and the i7 and i9 are above $250. The current retail price of 32GB memory is $160 (somewhat elevated due to the supply-demand situation, it was about $110 last year?). It is not difficult to argue that the low-end processors should be configured with 4 and 8GB, the midrange with 8 and 16GB, and the high-end with 16 and 32GB. The brand and sub-brands could be restructured to divisions of 4, 8, 16 and 32GB DRAM + 3rd channel.

The Core i9 should have specially binned DRAM parts for lower latency. This set would cover the very large majority of use cases without affecting the number of SKUs necessary to cover the market spectrum.?Note: the heat sink on top of both CPU and memory could reduce the memory operation temperature range, allowing a lower latency setting(?), depending on whether heat from the processor spills over?

That said, there still is legitimate demand for flexibility, specifically very large memory configuration. This could be handled by a processor with a third memory channel that goes off package to DIMM slots on the motherboard. Perhaps a more important future direction is a processor with in-package eDRAM (DRAM manufactured on a logic process, which has lower density, but also lower latency) or even SRAM.?Memory upgradeability was once an important system feature. Going forward, it has become a ball and chain road blocking continuing progress in performance.

Appendix

Below is a 4MB memory board for the VAX 8600, made with 152(?) x 256K DRAM chips.

Below is the board cage for the VAX 11/785. In this era, 1M of memory was several thousand dollars, and memory upgradeability was absolutely essential. Today, memory upgradeability is a legacy artifact holding us back from making progress is computing performance.

The Apple M1 with 8GB, so 2 packages of 2x16Gbit DRAM chips?

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Joe Changçš„æ›´å¤šæ–‡ç«

US Population from Neilsberg vs SSA (Elon)

2025å¹´2æœˆ17æ—¥

US Population from Neilsberg vs SSA (Elon)

update : data from census.gov, not sure if actual or estimated, additional bracket for 80-89, 90-99 and 100+.

1 æ¡è¯„è®º
single to multi-socket scaling

2024å¹´6æœˆ6æ—¥

single to multi-socket scaling

Lenovo and AMD recently published TPC-E benchmark result for the 2-socket EPYC 9554. Most recent AMD TPC-E results haveâ€¦

5 æ¡è¯„è®º
SQL Server Performance from Intel Comet Lake to Raptor Lake

2024å¹´1æœˆ3æ—¥

SQL Server Performance from Intel Comet Lake to Raptor Lake

A year ago, I reported on the performance characteristics of basic SQL Server operations (L2 Cache Sizeâ€¦
Insert, Update and Delete Tricks

2023å¹´3æœˆ13æ—¥

Insert, Update and Delete Tricks

The previous articles Insert, Update and Delete Plan Cost, Asynchronous IO and Storage Arrays and IUD Performanceâ€¦

1 æ¡è¯„è®º
Filtered Statistics Tricks in SQL Server

2023å¹´1æœˆ12æ—¥

Filtered Statistics Tricks in SQL Server

Data distribution statistics is one the foundational elements of cost-based query optimization in modern relationalâ€¦

2 æ¡è¯„è®º
L2 Cache Size & SQL Performance

2023å¹´1æœˆ1æ—¥

L2 Cache Size & SQL Performance

L2 Cache Size Impact on SQL Server Performance Edit 2023-04, Update for Raptor Lake L2 Cache Size Impact on SQL Serverâ€¦
SQL Server Joins - 2 SARGs

2022å¹´12æœˆ12æ—¥

SQL Server Joins - 2 SARGs

SQL Server Join Costs, 2 SARG The previous Join costs covered join with a search argument on one source only at DOP 1â€¦

2 æ¡è¯„è®º
SQL Server Parallel Plan Cost

2022å¹´12æœˆ8æ—¥

SQL Server Parallel Plan Cost

SQL Server Parallel Join Costs Up until the early 2000's, microprocessor vendors focused on improving the performanceâ€¦
SQL Server Join Costs

2022å¹´12æœˆ5æ—¥

SQL Server Join Costs

SQL Server Join Costs, 1 SARG Here we look at Loop, Hash and Merge joins with an equality value search argument (SARG)â€¦

2 æ¡è¯„è®º
SQL Server Key Lookup & Scan

2022å¹´12æœˆ2æ—¥

SQL Server Key Lookup & Scan

SQL Server Key Lookup & Scan, Plan vs. Actual Costs Previously in Plan Cost Cross-Over, we looked at the SQL Serverâ€¦

See all articles

ç¤¾åŒºæ´žå¯Ÿ

Very-Large-Scale Integration (VLSI)

What are the advantages and challenges of using VLSI-based flash memory for data-intensive applications?

The Case Against Memory Upgradeability

Joe Chang

é¢†è‹±æŽ¨è

Joe Changçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

NAND flash memory

Overview of eNVM solutions of major MCU manufacturers: Leading the wave of emerging applications such as automotive and edge AI

China's Path to Semiconductor Independence: Just a Matter of Time

Why the Semiconductor Industry is Hard to Crack: A Deep Dive into Individual Segments ????

History of SEMICONDUCTOR industry and UST company

Until Hybrid Bonding Arrives, There's One More Package to Keep HPC Alive

This Quarter in FPGAs...

DRAM Memory Market '25: SKHynix Leads HBM, Samsung & Micron Face Growing Challenges ??

Memory technologies and packaging options

Can Threshold Switches Replace Transistors in the Memory Cell?

é¢†è‹±æŽ¨è

Joe Changçš„æ›´å¤šæ–‡ç«

US Population from Neilsberg vs SSA (Elon)

single to multi-socket scaling

SQL Server Performance from Intel Comet Lake to Raptor Lake

Insert, Update and Delete Tricks

Filtered Statistics Tricks in SQL Server

L2 Cache Size & SQL Performance

SQL Server Joins - 2 SARGs

SQL Server Parallel Plan Cost

SQL Server Join Costs

SQL Server Key Lookup & Scan

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

NAND flash memory

Overview of eNVM solutions of major MCU manufacturers: Leading the wave of emerging applications such as automotive and edge AI

China's Path to Semiconductor Independence: Just a Matter of Time

Why the Semiconductor Industry is Hard to Crack: A Deep Dive into Individual Segments ????

History of SEMICONDUCTOR industry and UST company

Until Hybrid Bonding Arrives, There's One More Package to Keep HPC Alive

This Quarter in FPGAs...

DRAM Memory Market '25: SKHynix Leads HBM, Samsung & Micron Face Growing Challenges ??

Memory technologies and packaging options

Can Threshold Switches Replace Transistors in the Memory Cell?

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†