An idea whose time has come: Onur Mutlu on processing in memory, holistic architecture design and fundamentally better computing systems
Check out videos from Onur's 2024 ACACES course on #HiPEACTV: https://bit.ly/ACACES24_playlist
In 2018, HiPEAC spoke to Onur Mutlu (ETH Zurich) about the role of memory in computing systems, just after Onur’s course at the HiPEAC summer school, ACACES (see ' "It's the memory, stupid": A conversation with Onur Mutlu'). Six years later, fresh from a new edition of ACACES, we caught up with Onur to find out what’s changed in the field of processing in memory, how disruptive hardware concepts get taken up, promising directions in computer architecture, and more.
How much has changed in this field over the last few years?
A lot, and in a very positive direction. When I first taught at ACACES in 2013, I described our initial ideas on processing in memory (PIM), including RowClone [1], a simple and effective mechanism for in-memory bulk data copy and initialization. However, at that time, we did not even have our first works published – some had been submitted but were rejected as being unrealistic or niche.
By the time I returned to teach at ACACES in 2018, we had a relatively large body of published work on the topic, so my course included much more material on processing in memory. Our works generated a lot of excitement, but there were still no industrial products or prototypes and there was still large scepticism, even among academics.
For example, our Tesseract paper [2], published at ISCA 2015, was rejected twice before being accepted, while our Ambit work [3], published at MICRO 2017, was rejected four times before finally being accepted. Both works led to significant follow-on research after publication and have had a lasting influence; in fact, the Tesseract paper was selected for inclusion in the ISCA@50 25-Year Retrospective: 1996-2020, a select set of important papers from the past 25 years of the ISCA conference.
Six years later, in 2024, we have a completely different landscape. Processing-in-memory was an even bigger part of my ACACES 2024 course. My lectures covered various aspects of the field, from the near-term ‘processing-near-memory’ approaches, which place conventional logic elements near memory structures, to the longer-term ‘processing-using-memory’ approaches, which fundamentally exploit the analogue operational principles of memory structures for logic computation.
As for industry, with Google we have demonstrated the benefits of processing-near-memory for mobile workloads [13] and machine-learning accelerators [14], while with NVIDIA we showed the benefits of PIM on graphics processing unit (GPU) workloads. There is now at least one commercial product (from a European company, UPMEM) that places a general-purpose programmable multithreaded processor next to each bank in a DRAM chip. We experimentally evaluated the benefits and trade-offs of such a PIM product in both my ACACES course and our research papers(e.g. [6,7,8,9,10]). Meanwhile, the best paper at HPCA 2024 was based on the UPMEM chip [62]. One thing is clear: having such hardware has enabled significant progress in the research community by enabling software development, benchmarking, and much better understanding.
There are prototype DRAM chips from multiple large companies with similar near-bank processing capabilities specialized for accelerating machine learning workloads, e.g., the Samsung FIMDRAM, SK Hynix AiM, and Alibaba’s prototype. There are also other prototypes that perform processing near DRAM chips, such as the AxDIMM module from Samsung and Meta. Many startups are also trying to create systems that perform some sort of PIM.
Furthermore, in work published in 2024 [11,12], we were able to show that commodity DRAM chips, without any modification to the chip itself, just by modifying the memory controller, are able to execute bulk-bitwise operations (including NAND, NOR, NOT, AND, OR, MAJority, Row Copy and Initialization functions) in a reasonably robust manner. These fascinating findings demonstrate the fundamental computation capability of DRAM, even when DRAM chips are not designed for this purpose, and provide a solid foundation for building processing mechanisms into future DRAM chips and standards.
We encourage everyone to build on our results, by open sourcing all our infrastructures and code (see, for example, our PiDRAM infrastructure [15] and source code for functionally complete DRAM [11] and SimRA-DRAM [12]). You can also explore this topic further in my lecture videos, available online.
What else need to be done to promote uptake of PIM?
We are not in it for the short term when we do PIM research; we have to be patient. Demonstrating results showing that PIM hardware provides large improvements in both performance and energy efficiency (see our PIM overview papers for examples [5, 23]) helps industry to pay attention. If workloads where the benefits are high are critical workloads to users, the attention level increases.
Disruptive hardware should address a serious need in industry, and the risk involved in changing paradigm should be compensated by significant benefits. For PIM, this is increasingly the case: data access and movement is an increasingly worse bottleneck for essentially all metrics we care about (performance, energy, scalability, sustainability, robustness), applications are increasingly bottlenecked by data movement and access, and we are extremely power and energy constrained in our systems (see the state of cutting-edge machine-learning / artificial-intelligence workloads and systems, for example). At the same time, memory technology is not getting better. So our designs are squeezed in the middle, with nowhere to escape, in a processor-centric paradigm.
Yet it is not enough to introduce disruptive hardware, even if it is outstanding. There is a large software stack that needs to be built to make sure the hardware is usable, efficient, and effective – and that is where most of the work needs to be done to make a disruptive hardware concept successful. We therefore focus a lot on software programming, tools, methodologies and compilers in our research. We have multiple works in this area that make life easier for the programmer, by enabling them to decide what to execute where (e.g. PIM-Enabled Instructions [16] and DAMOV [17]), compile easily into PIM hardware (e.g.,SIMDRAM [18] and MIMDRAM [19]), and program using high-level frameworks that enable better productivity and better automated optimizations (e.g., SimplePIM [21] and Dappa [22]).
Changing a paradigm to be fundamentally better will always incur cost, but this cost may be amortized over time and the system may also become easier to program over time as software designers understand the new hardware. A new paradigm can enable tradeoff points that are much better than ‘business as usual’. Expecting it to immediately outperform ‘business as usual’ in all aspects is reactionary and should be avoided.
What are the main memory technologies to keep an eye on?
It is unclear when we will find a new technology that could replace DRAM for main memory and NAND flash for storage – bear in mind that it took at least three decades for NAND flash memory to become widespread.
In my view, DRAM and flash memory will continue to be very strong in the general-purpose domain, despite all the scaling challenges that are plaguing them today (e.g. RowHammer [25, 26, 27] and RowPress [28]) – challenges that would be better handled with more system-memory co-design. For example, industry is finally moving to having intelligence (i.e. logic) in DRAM chips and memory controllers to avoid RowHammer and RowPress bitflips [30]. This is a step in the right direction, but we should also be rethinking the rigid interfaces we have to memory chips today, which give them little breathing room to perform management or computation functions (see, for example, our upcoming work on self-managing DRAM [31]).
For future DRAM, emerging 3D and ferroelectric technologies are promising. While true 3D DRAM may take some time, 3D stacking of DRAM and logic, with a high-quality (i.e. logic process) logic layer, will happen in the shorter term. This will enable much better processing-near-memory.
What do you think are promising directions in computer architecture more generally?
I think we should freely explore creative ideas that have high potential to enable fundamentally better – i.e. more efficient, higher performance, more robust and sustainable – computing systems. Interdisciplinary research to enable systems for bioinformatics, algorithm-hardware co-design, and research that optimizes workloads (e.g. graph analytics, AI, genomics [2, 61]) across the stack, all the way from algorithms to devices, is very valuable. These directions require a more holistic way of thinking about the computing stack and co-designing across the transformation hierarchy, which I call the ‘broader view of computing architecture’. The HiPEAC Vision touches on this nicely by calling for ‘global co-design’.
One major direction is designing algorithms and systems for biological sequence analysis and information processing (e.g. for genomics). This is going to be even more important in the future as fast, efficient analysis of data is critical for many medical, public health, and personalized-medicine use cases.
Examples from my group include in-storage computing accelerators for genome filtering [35] and in-storage computing for metagenomics [36]. In-storage computing and specialized accelerators placed near flash chips greatly improve the performance and efficiency of many genome analytics workloads by reducing the data movement bottleneck from the storage system and specializing the computation to the primitives needed by genome-analysis workloads.
Processing-in-memory is another promising direction to accelerate such data-intensive workloads with algorithm-architecture co-design and co-optimization, as some of our other works show [37, 38, 39, 40]. We are also examining an exciting new paradigms for genome analysis, raw-signal analysis (e.g., RawHash [41], RawHash2 [42], RawAlign [43]) that operates directly on electrical signals generated by modern nanopore sequencing devices, without requiring the translation of such signals to the genomic alphabet – a very costly process in modern systems.
Another exciting direction is designing architectural controllers, such as memory controllers, prefetchers, memory and thread-management mechanisms, based on observed data in the field via machine learning techniques. A data-driven architecture [46] enables the machine itself to learn the (best) policies for managing itself and executing programs.
Prime examples of such a controller are reinforcement-learning-based, self-optimizing memory controllers and prefetchers [47]. Such controllers not only improve performance and efficiency under a wide variety of conditions and workloads but also reduce the burden on hardware and system designers. We believe an intelligent architecture will consist of a collection of such intelligent controllers that perform automatic data-driven online policy learning, including learning how to best coordinate with each other to make decisions that benefit the overall system. Such machines learn the best policies over time and thus become better as they learn, adapting, evolving, and executing far-sighted policies.
To enable such a machine, we need to revisit the design of all controllers (e.g. caching, prefetching, storage, memory, interconnect) and turn them into data-driven agents. Some example works in this area apply data-driven design principles to memory controllers, prefetchers [48], memory hierarchy management policies [49], and hybrid storage systems [50].
A third direction that is critically important is building fundamentally robust (i.e. safe, reliable, and secure) systems. This is especially necessary since computing infrastructure is increasingly used in all scenarios that affect human life today. Robustness should become a goal from the beginning of the design to the end of the lifetime of a computing system. We have a long way to go to achieve this goal and technology scaling does not help us: denser chips have more robustness problems (e.g., the RowHammer [25, 26, 27] and RowPress [28] phenomena in modern DRAM chips used in essentially all computers), which can be exploited for security attacks or which can manifest themselves as safety problems (think bitflips in self-driving cars, planes, and spacecraft).
A final direction I would highlight is architecture-technology co-optimization. Rethinking how modern architectures should be designed for emerging technologies, like tightly integrated packaging and interconnection techniques (e.g. 3D stacking of logic and DRAM, monolithic 3D stacking, 3D DRAM / NVM, promising and unconventional emerging memory technologies) is an exciting and important direction. We have been doing research in emerging technologies for a while (such as our work on phase-change memory, STT-MRAM and similar emerging memory technologies[51, 52, 53, 54, 55, 56, 57, 58, 59], and monolithic 3D [60]) and we intend to continue.
What about other considerations, such as sustainability?
As our systems are not designed to be energy efficient and sustainable from the get-go, we are recklessly building huge data centres, mainly to try to satisfy a particular type of growing ML models, that waste the world’s most important energy and power resources. Fundamentally rethinking how we can make the hardware extremely efficient and sustainable is necessary.
With any fundamental rethinking, we should explore various options, including the extremes. A good extreme option for researchers to explore is to take a clean-slate approach and ask how one would design everything in a computing system to be energy efficient, sustainable and high performance. I believe processing everywhere, including in memory and storage, is the right choice if one thinks this way.
You can find videos of Onur Mutlu’s lectures on his YouTube channel.
[1] Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Michael A. Kozuch, Phillip B. Gibbons, and Todd C. Mowry, "RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization" Proceedings of the 46th International Symposium on Microarchitecture (MICRO), Davis, CA, December 2013. [Slides (pptx)(pdf)] [Lightning Session Slides (pptx)(pdf)] [Poster (pptx)(pdf)] [Lecture Slides (pptx)(pdf) [Lecture Video (2 hrs 19 mins), 24 September 2020]
[2] Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi, "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing" Proceedings of the 42nd International Symposium on Computer Architecture (ISCA), Portland, OR, June 2015. [Slides (pptx)(pdf)] [Lightning Session Slides (pptx)(pdf)] [Invited Retrospective at 50 Years of ISCA, 2023 (pdf)] Top Picks Honorable Mention by IEEE Micro. Selected for the ISCA-50 25-Year Retrospective Issue covering 1996-2020 in 2023 (Retrospective (pdf)Full issue).
[3] Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry, "Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology" Proceedings of the 50th International Symposium on Microarchitecture (MICRO), Boston, MA, USA, October 2017. [Slides (pptx)(pdf)] [Lightning Session Slides (pptx)(pdf)] [Poster (pptx)(pdf)] [Arxiv version: "Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM", 2016]
[4] ISCA@50 25-Year Retrospective: 1996-2020. ACM SIGARCH and IEEE TCCA
[5] Onur Mutlu, Saugata Ghose, Juan Gomez-Luna, and Rachata Ausavarungnirun, "A Modern Primer on Processing in Memory" Invited Book Chapter in Emerging Computing: From Devices to Systems - Looking Beyond Moore and Von Neumann, Springer, 2021. [Tutorial Video on "Memory-Centric Computing Systems" (1 hour 51 minutes)] [Extended arxiv version]
[6] Juan Gomez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu, "Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System" IEEE Access, 10 May 2022. [arXiv version] [PrIM Benchmarks Source Code] [Slides (pptx)(pdf)] [Long Talk Slides (pptx)(pdf)] [Short Talk Slides (pptx)(pdf)] [SAFARI Live Seminar Slides (pptx)(pdf)] [SAFARI Live Seminar Video (2 hrs 57 mins)] [Lightning Talk Video (3 minutes)] [Short Talk Video (21 minutes)] [1-hour Talk Video (58 minutes)]
[7] Juan Gomez-Luna, Izzat El Hajj, Ivan Fernandez, Christina Giannoula, Geraldo F. Oliveira, and Onur Mutlu, "Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware" Invited Paper at Workshop on Computing with Unconventional Technologies (CUT), Virtual, October 2021. [arXiv version] [PrIM Benchmarks Source Code] [Slides (pptx)(pdf)] [Talk Video (37 minutes)] [Lightning Talk Video (3 minutes)]
[8] Juan Gómez Luna, Yuxin Guo, Sylvan Brocard, Julien Legriel, Remy Cimadomo, Geraldo F. Oliveira, Gagandeep Singh, and Onur Mutlu, "Evaluating Machine Learning Workloads on Memory-Centric Computing Systems" Proceedings of the 2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Raleigh, North Carolina, USA, April 2023. [Extended arXiv version] [Slides (pptx)(pdf)] [PIM-ML Source Code] [Talk Video (15 minutes)] Best paper session.
[9] Christina Giannoula, Ivan Fernandez, Juan Gomez-Luna, Nectarios Koziris, Georgios Goumas, and Onur Mutlu, "SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures" Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Mumbai, India, June 2022. [Extended arXiv Version] [Abstract] [Slides (pptx)(pdf)] [Long Talk Slides (pptx)(pdf)] [SparseP Source Code] [Talk Video (16 minutes)] [Long Talk Video (55 minutes)]
[10] Steve Rhyner, Haocong Luo, Juan Gómez-Luna, Mohammad Sadrosadati, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, and Onur Mutlu, "Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory Architecture" To Appear in Proceedings of the 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT), Long Beach, CA, USA, October 2024. [Preliminary arXiv version]
[11] Ismail Emir Yuksel, Yahya Can Tugrul, Ataberk Olgun, F. Nisa Bostanci, A. Giray Yaglikci, Geraldo F. Oliveira, Haocong Luo, Juan Gomez-Luna, Mohammad Sadrosadati, and Onur Mutlu, "Functionally-Complete Boolean Logic in Real DRAM Chips: Experimental Characterization and Analysis" Proceedings of the 30th International Symposium on High-Performance Computer Architecture (HPCA), April 2024. [Slides (pptx) (pdf)] [arXiv version] [FCDRAM Source Code]
[12] Ismail Emir Yuksel, Yahya Can Tugrul, F. Nisa Bostanci, Geraldo F. Oliveira, A. Giray Yaglikci, Ataberk Olgun, Melina Soysal, Haocong Luo, Juan Gomez-Luna, Mohammad Sadrosadati, and Onur Mutlu, "Simultaneous Many-Row Activation in Off-the-Shelf DRAM Chips: Experimental Characterization and Analysis" Proceedings of the 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Brisbane, Australia, June 2024. [Slides (pptx) (pdf)] [arXiv version] [SiMRA-DRAM Source Code (Officially Artifact Evaluated with All Badges)] Officially artifact evaluated as both code and dataset available, reviewed and reproducible.
[13] Amirali Boroumand, Saugata Ghose, Youngsok Kim, Rachata Ausavarungnirun, Eric Shiu, Rahul Thakur, Daehyun Kim, Aki Kuusela, Allan Knies, Parthasarathy Ranganathan, and Onur Mutlu, "Google Workloads for Consumer Devices: Mitigating Data Movement Bottlenecks" Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Williamsburg, VA, USA, March 2018. [Slides (pptx)(pdf)] [Lightning Session Slides (pptx) (pdf)] [Poster (pptx)(pdf)] [Lightning Talk Video (2 minutes)] [Full Talk Video (21 minutes)]
[14] Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, and Onur Mutlu, "Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks" Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), Virtual, September 2021. [Slides (pptx) (pdf)] [Talk Video (14 minutes)] [Older arXiv version]
[15] Ataberk Olgun, Juan Gomez Luna, Konstantinos Kanellopoulos, Behzad Salami, Hasan Hassan, Oguz Ergin, and Onur Mutlu, "PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM" ACM Transactions on Architecture and Code Optimization (TACO), March 2023. [arXiv version] Presented at the 18th HiPEAC Conference, Toulouse, France, January 2023. [Slides (pptx)(pdf)] [Longer Lecture Slides (pptx)(pdf)] [Lecture Video (40 minutes)] [PiDRAM Source Code]
[16] Junwhan Ahn, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi, "PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture" Proceedings of the 42nd International Symposium on Computer Architecture (ISCA), Portland, OR, June 2015. [Slides (pptx) (pdf)] [Lightning Session Slides (pptx) (pdf)]
[17] Geraldo F. Oliveira, Juan Gomez-Luna, Lois Orosa, Saugata Ghose, Nandita Vijaykumar, Ivan fernandez, Mohammad Sadrosadati, and Onur Mutlu, DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks" IEEE Access, 8 September 2021. Preprint in arXiv, 8 May 2021. [arXiv preprint] [IEEE Access version] [DAMOV Suite and Simulator Source Code] [SAFARI Live Seminar Video (2 hrs 40 mins)] [Short Talk Video (21 minutes)] [Short Talk Slides (pptx) (pdf)] [Long Talk Slides (pptx)(pdf)]
[18] Nastaran Hajinazar, Geraldo F. Oliveira, Sven Gregorio, Joao Dinis Ferreira, Nika Mansouri Ghiasi, Minesh Patel, Mohammed Alser, Saugata Ghose, Juan Gomez-Luna, and Onur Mutlu, "SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM" Proceedings of the 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, March-April 2021. [2-page Extended Abstract] [Short Talk Slides (pptx) (pdf)] [Talk Slides (pptx) (pdf)] [Short Talk Video (5 mins)] [Full Talk Video (27 mins)]
[19] Geraldo F. Oliveira, Ataberk Olgun, Abdullah Giray Yaglikci, F. Nisa Bostanci, Juan Gomez-Luna, Saugata Ghose, and Onur Mutlu, "MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing" Proceedings of the 30th International Symposium on High-Performance Computer Architecture (HPCA), April 2024. [Slides (pptx)(pdf)] [arXiv version] [MIMDRAM Source Code]
[20] Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Niladrish Chatterjee, Mike O'Connor, Nandita Vijaykumar, Onur Mutlu, and Stephen W. Keckler, "Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems" Proceedings of the 43rd International Symposium on Computer Architecture (ISCA), Seoul, South Korea, June 2016. [Slides (pptx)(pdf)] [Lightning Session Slides (pptx) (pdf)]
[21] Jinfan Chen, Juan Gómez-Luna, Izzat El Hajj, Yuxin Guo, and Onur Mutlu, "SimplePIM: A Software Framework for Productive and Efficient Processing in Memory" Proceedings of the 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT), Vienna, Austria, October 2023. [Slides (pptx) (pdf)] [SimplePIM Source Code]
[22] Geraldo F. Oliveira, Alain Kohli, David Novo, Juan Gómez-Luna, Onur Mutlu, "DaPPA: A Data-Parallel Framework for Processing-in-Memory Architectures", 2023. https://arxiv.org/abs/2310.10168
[23] Onur Mutlu, Saugata Ghose, Juan Gomez-Luna, and Rachata Ausavarungnirun, "Processing Data Where It Makes Sense: Enabling In-Memory Computation" Invited paper in Microprocessors and Microsystems (MICPRO), June 2019. [arXiv version] [Slides (pptx)] [Talk Video]
[24] Thomas Kuhn, The Structure of Scientific Revolutions, 1962.
[25] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad Lai, and Onur Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors" Proceedings of the 41st International Symposium on Computer Architecture (ISCA), Minneapolis, MN, June 2014. [Slides (pptx)(pdf)] [Lightning Session Slides (pptx) (pdf)] [Source Code and Data] [RowHammer Summary Slides (pptx)] [RowHammer Summary] [Coverage on ZDNet 1] [Coverage on ZDNet 2] [MemTest86 Hammer Test] [RowHammer Discussion Group] [Discussion on Twitter] [Lecture Video (1 hr 49 mins), 25 September 2020] [Invited Retrospective at IEEE TCAD Top Picks in Hardware and Embedded Security, 2019 (pdf)] [Invited Retrospective at 50 Years of ISCA, 2023 (pdf)] One of the 7 papers of 2012-2017 selected as Top Picks in Hardware and Embedded Security for IEEE TCAD (link). Selected for the ISCA-50 25-Year Retrospective Issue covering 1996-2020 in 2023 (Retrospective (pdf)(Full Issue). Winner of the 2024 IFIP Jean-Claude Laprie Award in dependable computing (link).
[26] Onur Mutlu and Jeremie Kim, "RowHammer: A Retrospective" IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD) Special Issue on Top Picks in Hardware and Embedded Security, 2019. [Preliminary arXiv version] [Slides from COSADE 2019 (pptx)] [Slides from VLSI-SOC 2020 (pptx)(pdf)] [Talk Video (1 hr 15 minutes, with Q&A)]
[27] Onur Mutlu, Ataberk Olgun, and A. Giray Yaglikci, "Fundamentally Understanding and Solving RowHammer" Invited Special Session Paper at the 28th Asia and South Pacific Design Automation Conference (ASP-DAC), Tokyo, Japan, January 2023. [arXiv version] [Slides (pptx)(pdf)] [Talk Video (26 minutes)]
[28] Haocong Luo, Ataberk Olgun, Giray Yaglikci, Yahya Can Tugrul, Steve Rhyner, M. Banu Cavlak, Joel Lindegger, Mohammad Sadrosadati, and Onur Mutlu, "RowPress: Amplifying Read Disturbance in Modern DRAM Chips" Proceedings of the 50th International Symposium on Computer Architecture (ISCA), Orlando, FL, USA, June 2023. [Extended arxiv version] [Slides (pptx)(pdf)] [Lightning Talk Slides (pptx)(pdf)] [Lightning Talk Video (3 minutes)] [Talk Video (14 minutes, including Q&A)] [RowPress Source Code and Datasets (Officially Artifact Evaluated with All Badges)] Officially artifact evaluated as available, reusable and reproducible. Distinguished artifact award at ISCA 2023. One of the 12 computer architecture papers of 2023 selected as Top Picks by IEEE Micro.
[29] Onur Mutlu, "Memory Scaling: A Systems Architecture Perspective" Proceedings of the 5th International Memory Workshop (IMW), Monterey, CA, May 2013. Slides (pptx)(pdf) EETimes Reprint
[30] O?uzhan Canpolat, A. Giray Ya?l?k??, Geraldo F. Oliveira, Ataberk Olgun, O?uz Ergin, and Onur Mutlu, "Understanding the Security Benefits and Overheads of Emerging Industry Solutions to DRAM Read Disturbance" 4th Workshop on DRAM Security (DRAMsec), held with 51st Annual International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, July 2024. [Slides (pptx)(pdf)] [arXiv version] [Source Code]
[31] Hasan Hassan, Ataberk Olgun, A. Giray Yaglikci, Haocong Luo, Onur Mutlu, "Self-Managing DRAM: A Low-Cost Framework for Enabling Autonomous and Efficient in-DRAM Operations", 2022. To Appear at MICRO 2024. arxiv.org/abs/2207.13358
[32] Mohammed Alser, Zulal Bingol, Damla Senol Cali, Jeremie Kim, Saugata Ghose, Can Alkan, and Onur Mutlu, "Accelerating Genome Analysis: A Primer on an Ongoing Journey" IEEE Micro (IEEE MICRO), Vol. 40, No. 5, pages 65-75, September/October 2020. [Slides (pptx) (pdf)] [RECOMB 2021 Highlights Talk Slides (pptx)(pdf)] [Talk Video (1 hour 2 minutes)]
[33] Mohammed Alser, Joel Lindegger, Can Firtina, Nour Almadhoun Alserr, Haiyu Mao, Gagandeep Singh, Juan Gomez-Luna, and Onur Mutlu, "From Molecules to Genomic Variations: Accelerating Genome Analysis via Intelligent Algorithms and Architectures" Invited Article in Computational and Structural Biotechnology Journal (CSBJ), August 2022. [arXiv version with all Supplementary Materials] [Online version at the Computational and Structural Biotechnology Journal] [Source Code] [DAC 2023 Talk Video (37 minutes)]
[34] Onur Mutlu and Can Firtina, "Accelerating Genome Analysis via Algorithm-Architecture Co-Design" Invited Special Session Paper in Proceedings of the 60th Design Automation Conference (DAC), San Francisco, CA, USA, July 2023. [arXiv version] [Slides (pptx)(pdf)] [DAC 2023 Talk Video (37 minutes)]
[35] Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, and Onur Mutlu, "GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis" Proceedings of the 27th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Virtual, February-March 2022. [Talk Slides (pptx) (pdf)] [Lightning Talk Slides (pptx)(pdf)] [Lightning Talk Video (90 seconds)] [Talk Video (17 minutes)] [GenStore Source Code]
[36] Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joel Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, and Onur Mutlu, "MegIS: High-Performance and Low-Cost Metagenomic Analysis with In-Storage Processing" Proceedings of the 51st Annual International Symposium on Computer Architecture (ISCA), Buenos Aires, Argentina, July 2024. [Slides (pptx)(pdf)] [arXiv version]
[37] Jeremie S. Kim, Damla Senol Cali, Hongyi Xin, Donghyuk Lee, Saugata Ghose, Mohammed Alser, Hasan Hassan, Oguz Ergin, Can Alkan, and Onur Mutlu, "GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies" BMC Genomics, 2018. Proceedings of the 16th Asia Pacific Bioinformatics Conference (APBC), Yokohama, Japan, January 2018. [Slides (pptx)(pdf)] [Source Code] [arxiv.org Version (pdf)] [Talk Video at AACBB 2019]
[38] Damla Senol Cali, Gurpreet S. Kalsi, Zulal Bingol, Can Firtina, Lavanya Subramanian, Jeremie S. Kim, Rachata Ausavarungnirun, Mohammed Alser, Juan Gomez-Luna, Amirali Boroumand, Anant Nori, Allison Scibisz, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu, "GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis" Proceedings of the 53rd International Symposium on Microarchitecture(MICRO), Virtual, October 2020. [Slides (pptx)(pdf)] [Short Talk Slides (pptx)(pdf)] [Lightning Talk Slides (pptx)(pdf)] [ARM Research Summit Talk Slides (pptx)(pdf)] [ARM Research Summit Short Talk Slides (pptx)(pdf)] [Lecture Slides (pptx)(pdf)] [MICRO 2020 Talk Video (18 minutes)] [MICRO 2020 Short Talk Video (6 minutes)] [MICRO 2020 Lighting Talk Video (1.5 minutes)] [ARM Research Summit Talk Video (21 minutes)] [ARM Research Summit Short Talk Video (15 minutes)] [ARM Research Summit Short Talk Video and Q&A (31 minutes)] [Lecture Video (37 minutes)] [GenASM Source Code]
[39] Damla Senol Cali, Konstantinos Kanellopoulos, Joel Lindegger, Zulal Bingol, Gurpreet S. Kalsi, Ziyi Zuo, Can Firtina, Meryem Banu Cavlak, Jeremie Kim, Nika MansouriGhiasi, Gagandeep Singh, Juan Gomez-Luna, Nour Almadhoun Alserr, Mohammed Alser, Sreenivas Subramoney, Can Alkan, Saugata Ghose, and Onur Mutlu, "SeGraM: A Universal Hardware Accelerator for Genomic Sequence-to-Graph and Sequence-to-Sequence Mapping" Proceedings of the 49th International Symposium on Computer Architecture (ISCA), New York, June 2022. [Slides (pptx)(pdf)] [arXiv version] [SeGraM Source Code and Datasets] [Talk Video (22 minutes)]
[40] Gagandeep Singh, Mohammed Alser, Damla Senol Cali, Dionysios Diamantopoulos, Juan Gómez-Luna, Henk Corporaal, and Onur Mutlu, ""FPGA-based Near-Memory Acceleration of Modern Data-Intensive Applications" IEEE Micro (IEEE MICRO), 2021.
[41] Can Firtina, Nika Mansouri Ghiasi, Joel Lindegger, Gagandeep Singh, Meryem Banu Cavlak, Haiyu Mao, and Onur Mutlu, "RawHash: Enabling Fast and Accurate Real-Time Analysis of Raw Nanopore Signals for Large Genomes" Proceedings of the 31st Annual Conference on Intelligent Systems for Molecular Biology and the 22nd European Conference on Computational Biology (ISMB/ECCB), Lyon, France, July 2023. [Bioinformatics Journal version] [biorXiv version] [Slides (pptx)(pdf)] [RawHash Source Code]
[42] Can Firtina, Melina Soysal, Joel Lindegger, and Onur Mutlu, "RawHash2: Mapping Raw Nanopore Signals Using Hash-Based Seeding and Adaptive Quantization" Bioinformatics, [published online on] 30 July 2024. [Online link at Bioinformatics Journal] [arXiv version] [RawHash Talk Video (19 minutes) (26 minutes)] [RawHash2 Source Code]
[43] Joel Lindegger, Can Firtina, Nika Mansouri Ghiasi, Mohammad Sadrosadati, Mohammed Alser, and Onur Mutlu, "RawAlign: Accurate, Fast, and Scalable Raw Nanopore Signal Mapping via Combining Seeding and Alignment" Preprint on arxiv, October 2023. [arXiv version] [RawAlign Source Code]
[44] Gagandeep Singh, Mohammed Alser, Kristof Denolf, Can Firtina, Alireza Khodamoradi, Meryem Banu Cavlak, Henk Corporaal and Onur Mutlu, "RUBICON: A Framework for Designing Efficient Deep Learning-Based Genomic Basecallers" Genome Biology, February 2024. [arXiv version] [Journal Article] [RUBICON Source Code]
[45] Damla Senol, Jeremie Kim, Saugata Ghose, Can Alkan, and Onur Mutlu, "Nanopore Sequencing Technology and Tools for Genome Assembly: Computational Analysis of the Current State, Bottlenecks and Future Directions" Briefings in Bioinformatics (BIB), 2018. [Open arxiv.org version] [Slides (pptx)(pdf)] [Talk Video at AACBB 2019]
[46] Onur Mutlu, "Intelligent Architectures for Intelligent Computing Systems" Invited Paper in Proceedings of the Design, Automation, and Test in Europe Conference (DATE), Virtual, February 2021. [Slides (pptx)(pdf)] [IEDM Tutorial Slides (pptx) (pdf)] [Short DATE Talk Video (11 minutes)] [Longer IEDM Tutorial Video (1 hr 51 minutes)]
[47] Engin Ipek, Onur Mutlu, José F. Martínez, and Rich Caruana, "Self Optimizing Memory Controllers: A Reinforcement Learning Approach" Proceedings of the 35th International Symposium on Computer Architecture (ISCA), pages 39-50, Beijing, China, June 2008. Slides (pptx) [Invited Retrospective at 50 Years of ISCA, 2023 (pdf)] Selected for the ISCA-50 25-Year Retrospective Issue covering 1996-2020 in 2023 (Retrospective (pdf)Full Issue).
[48] Rahul Bera, Konstantinos Kanellopoulos, Anant Nori, Taha Shahroodi, Sreenivas Subramoney, and Onur Mutlu, "Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning" Proceedings of the 54th International Symposium on Microarchitecture (MICRO), Virtual, October 2021. [Slides (pptx)(pdf)] [Short Talk Slides (pptx)(pdf)] [Lightning Talk Slides (pptx)(pdf)] [Talk Video (20 minutes)] [>Lightning Talk Video (1.5 minutes)] [Pythia Source Code (Officially Artifact Evaluated with All Badges)] [arXiv version] Officially artifact evaluated as available, reusable and reproducible.
[49] Rahul Bera, Konstantinos Kanellopoulos, Shankar Balachandran, David Novo, Ataberk Olgun, Mohammad Sadrosadati, and Onur Mutlu, "Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load Prediction" Proceedings of the 55th International Symposium on Microarchitecture (MICRO), Chicago, IL, USA, October 2022. [Slides (pptx)(pdf)] [Longer Lecture Slides (pptx)(pdf)] [Talk Video (12 minutes)] [Lecture Video (25 minutes)] [arXiv version] [Source Code (Officially Artifact Evaluated with All Badges)] Officially artifact evaluated as available, reusable and reproducible. Best paper award at MICRO 2022.
[50] Gagandeep Singh, Rakesh Nadig, Jisung Park, Rahul Bera, Nastaran Hajinazar, David Novo, Juan Gomez-Luna, Sander Stuijk, Henk Corporaal, and Onur Mutlu, "Sibyl: Adaptive and Extensible Data Placement in Hybrid Storage Systems Using Online Reinforcement Learning" Proceedings of the 49th International Symposium on Computer Architecture (ISCA), New York, June 2022. [Slides (pptx)(pdf)] [arXiv version] [Sibyl Source Code] [Talk Video (16 minutes)]
[51] Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative" Proceedings of the 36th International Symposium on Computer Architecture (ISCA), pages 2-13, Austin, TX, June 2009. Slides (pdf) One of the 13 computer architecture papers of 2009 selected as Top Picks by IEEE Micro. Selected as a CACM Research Highlight.
[52] Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger, "Phase Change Technology and the Future of Main Memory" IEEE Micro, Special Issue: Micro's Top Picks from 2009 Computer Architecture Conferences (MICRO TOP PICKS), Vol. 30, No. 1, pages 60-70, January/February 2010.
[53] HanBin Yoon, Justin Meza, Rachata Ausavarungnirun, Rachael Harding, and Onur Mutlu, "Row Buffer Locality Aware Caching Policies for Hybrid Memories" Proceedings of the 30th IEEE International Conference on Computer Design (ICCD), Montreal, Quebec, Canada, September 2012. Slides (pptx)(pdf) Best paper award (in Computer Systems and Applications track) at ICCD 2012.
[54] HanBin Yoon, Justin Meza, Naveen Muralimanohar, Norman P. Jouppi, and Onur Mutlu, "Efficient Data Mapping and Buffering Techniques for Multi-Level Cell Phase-Change Memories" ACM Transactions on Architecture and Code Optimization (TACO), Vol. 11, No. 4, December 2014. [Slides (ppt) (pdf)] Presented at the 10th HiPEAC Conference, Amsterdam, Netherlands, January 2015. [Slides (ppt)(pdf)] Best (student) presentation award.
[55] Shihao Song, Anup Das, Onur Mutlu, and Nagarajan Kandasamy, "Enabling and Exploiting Partition-Level Parallelism (PALP) in Phase Change Memories" Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), New York City, NY, USA, October 2019. [Preliminary arXiv version] [Slides (pptx)(pdf)] [Poster (pptx)(pdf)]
[56] Shihao Song, Anup Das, Onur Mutlu, and Nagarajan Kandasamy, "Improving Phase Change Memory Performance with Data Content Aware Access" Proceedings of the ACM SIGPLAN International Symposium on Memory Management (ISMM), London, UK, June 2020. [Slides (pptx)(pdf)] [Talk Video (19 minutes)]
[57] Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, Hui Wang, and Onur Mutlu, "Utility-Based Hybrid Memory Management" Proceedings of the 19th IEEE Cluster Conference (CLUSTER), Honolulu, Hawaii, USA, September 2017. [Slides (pptx)(pdf)]
[58] Justin Meza, Jichuan Chang, HanBin Yoon, Onur Mutlu, and Parthasarathy Ranganathan, "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management" IEEE Computer Architecture Letters (CAL), February 2012.
[59] Emre Kultursay, Mahmut Kandemir, Anand Sivasubramaniam, and Onur Mutlu, "Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative" Proceedings of the 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, April 2013. Slides (pptx)(pdf)
[60] Nika Mansouri Ghiasi, Mohammad Sadrosadati, Geraldo F. Oliveira, Konstantinos Kanellopoulos, Rachata Ausavarungnirun, Juan Gómez Luna, Aditya Manglik, Jo?o Ferreira, Jeremie S. Kim, Christina Giannoula, Nandita Vijaykumar, Jisung Park, Onur Mutlu, "RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory", 2022. https://arxiv.org/abs/2210.08508
[61] Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Beránek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan Gómez-Luna, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Nils Blach, Marek Konieczny, Onur Mutlu, and Torsten Hoefler, "SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems" Proceedings of the 54th International Symposium on Microarchitecture (MICRO), Virtual, October 2021. [Slides (pdf)] [Talk Video (22 minutes)] [ target="_blank">Lightning Talk Video (1.5 minutes)] [Full arXiv version]
[62] B. Hyun, T. Kim, D. Lee and M. Rhu, "Pathfinding Future PIM Architectures by Demystifying a Commercial PIM Technology" In 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), Edinburgh, United Kingdom, 2024 pp. 263-279. doi: 10.1109/HPCA57654.2024.00029