登录查看更多内容

High Performance Computing Options - Part 1

Nilotpal Das

Information Technology leader, TOGAF & Zachman Certified EnterpriseArchitect

发布日期: 2024年12月16日

One of the areas where I really need to develop myself is High Performance Computing. I read about it a lot and sometimes also write about it, but its really difficult to remember all this. Primarily because the best way of learning something is to work on it on a regular basis. And considering my area of work doesn’t involve a lot of work on HPC, its difficult to remember all this. Yeah, I am kind of stupid that way. Now I understand the basics of GPU, TPU, DPU etc. so I am not entirely unschooled, but there was a very important point that Kapil, a colleague I have come to respect a lot, brought out in the year end workshop that we did this week.

First, Raghu, another close friend and a colleague, asked a very important question. “When it comes to HPC, what’s the value that we are driving?” At the moment the business just tells us what they need and we just deliver. But then Kapil said something profound which stuck with me. He said we must speak their language. When they ask for a specific hardware, we should be able to ask them what they really need it for. And based on a deeper understanding be able to provide them with the appropriate options. For this we definitely need to be informed about the best options for them.

I admit, at the moment this hasn’t happened with me a lot considering I manage the solution requirements coming from functions that are not doing a lot of HPC relevant work. There is HPC work, just not in my domain. But I am also running a cloud acceleration program, that cuts across all domains. As a part of this program, we are working on a PoC in the Accelerated Drug Development space. But currently we are in the PoC phase and so no HPC hardware needed at the moment. However, it is only a matter of time when requests will start coming to us for HPC and we don’t want to be caught with our pants down then.

So I am taking Kapil’s suggestion very seriously and am not only going to start studying HPC myself, I am going to expect my team to be better prepared for the “not so far away future” of being an AI Enabled Company. So I did some research and here is some basic foundational stuff from where I will start my study. And I guess this is going to be a series as everything cannot be covered in one article, lest it become too long. So this is Part 1

Latest HPC Hardware.

Quantum Processors: Yup, let’s start with Quantum Computing. IBM’s Quantum Heron Processors are among the most advanced, capable of running complex quantum circuits with up to 5000 two-qubit gate operations. These are ideal for scientific research in fields like materials science, chemistry and high energy physic.

To give you a general idea of how fast quantum processors can be. In 2019, Google demonstrated that their quantum computer, Sycamore, could solve a specific problem in 200 seconds. This same problem would take the world’s fastest supercomputer, Frontier, approximately 47 years to solve!

IBM offers access to their quantum processors through the IBM quantum experience. One can sign up for the Open Plan which provides free access to utility-scale quantum computers for up to 10 minutes of runtime per month. They also offer Pay-As-You-Go and Premium Plans for more extensive access and technical support ? Intel Data Centre GPU Max Series: Featuring NVIDIA H200 Tensor Core GPUs, these are optimized for demanding HPC and AI workloads. They provide significant performance improvements for tasks like training large language models and running complex simulations.? ? NVIDIA offers an Enterprise Suite that includes optimum performance through NVIDIA NIM and CUDA-X Microservices (more about this later), which provide optimized runtime and easy to use building blocks for generative AI development. There is Enterprise-Grade security, Deployment flexibility as it is standards based and containerized, allowing it to run on cloud, in datacenters and workstations and has enterprise support ? Amazon EC2 P5: Powered by NVIDIA H100 Tensor Core GPUs. These instances provide up to 8 NVIDIA H100 GPUs with 640 GB of high bandwidth GPU memory. Best use cases for P5s are Generative AI Applications for training and deploying LLMs and diffusion models for tasks like question answering, code generation, video and image generation and speech recognition. It is also suitable for demanding HPC tasks like pharmaceutical discovery, financial modelling, etc. ? NVIDIA BlueField DPU Series: DPUs are designed to offload data processing tasks from the CPU, managing data movement, security and network operations. They provide specialized acceleration for data-centric tasks and ensure efficient data flow. I have chosen this among many others because this is a bit of a wild card that could make things interesting. ? Google Cloud A3 Ultra VMs: Powered by NVIDIA H200 GPUs, Optimized for AI. It can easily scale to handle large datasets and complex models. Now, technically this is the second HPC I have chosen that is powered by H200 GPUs. That’s intentional. I have picked this one at the last moment for a very specific reason. We will talk about this in Part 2. This is another wild card.

Now obviously there are a lot more different varieties of HPC Hardware and I cannot cover all of them, but I have taken what I thought would be relevant and would also be an interesting mental exercise. In the next article, I will do a comparative analysis among these. I will try to think about each one from different perspectives (Effectiveness, cost, ROI, etc.) to build a deeper understanding of what can be used where and perhaps what we should invest more on depending on our use cases.

I am just studying these for my own understanding and sharing what I find fascinating. Another way of remembering all these model numbers and each model’s capacity / best use cases / etc. is to talk about it. So I am talking about it through my articles. But please let me know if you find it useful and if I should continue to part 2, 3 and more. Or is this too basic? Because there is no way I can know.?

It could be the Dunning-Kruger effect, a cognitive bias in which people wrongly overestimate their knowledge or ability in a specific area.?

Basically what I am trying to say is that stupid people never know that they are stupid because they are too stupid to know that. So if I am stupid, please tell me, otherwise I will never know.?:-)

3.47 am

823 位关注者

Vineet Tandon

Senior Manager-Verification at Microchip Technology Inc.

2 个月

Very Informative article!

1 次回应

查看更多评论

要查看或添加评论，请登录

Nilotpal Das的更多文章

Some AI Acronyms

2025年2月10日

Some AI Acronyms

So am going to be a little bit lazy today. Yes, my excuse is that this weekend was hectic, I am tired, I have no…
What's the ruckus about Deepseek?

2025年2月3日

What's the ruckus about Deepseek?

I am sure you must have heard about Deepseek. If you haven’t, here’s a quick rundown.

2 条评论
Cloud - A step towards AI Enabled

2025年1月27日

Cloud - A step towards AI Enabled

Once upon a time in a galaxy far far away, there was a bandwagon. It was called the Cloud Bandwagon.

3 条评论
The evolving role of the Infrastructure Professional

2025年1月20日

The evolving role of the Infrastructure Professional

Last week I had a very interesting discussion with someone about the future of infrastructure teams and how the entire…
Laws and Principles of High-Performance Computing

2025年1月13日

Laws and Principles of High-Performance Computing

We have all heard about the Moore’s law. It predicts that the number of transistors on a microchip doubles…

1 条评论
High Performance Computing Options - Part 4

2025年1月6日

High Performance Computing Options - Part 4

This is the last and final part of this series. And I will try to connect all the dots in this one.
High Performance Computing Options - Part 4

2024年12月30日

High Performance Computing Options - Part 4

In the last episode we talked about the various areas that HPC and AI can impact a pharmaceutical industry. There were…

1 条评论
Future Focus: High Performance Computing Options - Part 2

2024年12月23日

Future Focus: High Performance Computing Options - Part 2

This is a continuation to my previous post about High Performance Computing. Let’s look at how HPC can augment the work…
Critical Thinking, still a critical skill.

2024年12月9日

Critical Thinking, still a critical skill.

There is this beautiful short video that I saw on Steve Jobs talking about how to run a business. But primarily he…

2 条评论
Skills of the future

2024年12月2日

Skills of the future

Skills of the Future Usually I write about technology. But today it is slightly different.

2 条评论

See all articles

Latest HPC Hardware.

3.47 am

823 位关注者

Nilotpal Das的更多文章

Some AI Acronyms

What's the ruckus about Deepseek?

Cloud - A step towards AI Enabled

The evolving role of the Infrastructure Professional

Laws and Principles of High-Performance Computing

High Performance Computing Options - Part 4

High Performance Computing Options - Part 4

Future Focus: High Performance Computing Options - Part 2

Critical Thinking, still a critical skill.

Skills of the future