How to dramatically reduce the time from architecture spec to tapeout ?
Ronen Laviv
EDA and Cloud consultant | Driving digital transformation | Mid size and Enterprise companies | Business development for Greenfield Companies | Nurturing growth with established customers | AI Transformation
Nowadays most chip designers are coding in Register-Transfer Level (RTL). But, there’s another way, and that’s high-level synthesis (HLS). We’ve had feedback indicating that HLS can shorten the time from spec to tapeout by 50% on average. In this article, we’ll scratch the surface of the HLS technology that allows that.
Let’s start with some history
In the 70’s, designs were done at the transistor level. Compared with today’s design capabilities, that’s the stone age…
In the 80’s, logic gates were introduced, and designs were built using gates. This was a huge leap from the 70’s, and we’re using it even today as a byproduct, calling it a netlist (i.e. gates connected with wires). But netlists are far from being abstract or easy to read, code and debug. Designs done today with millions or billions of gates require a more high-level approach.
In the 90’s, the RTL design language was introduced. RTL code is more readable and easier to code or debug. Synthesis tools are taking RTL code and mapping into the netlist (so the process is not breaking the old way of doing things but is much more robust).
Around the beginning of the century, HLS was introduced, which allows designers to code in C and IEEE 1666 SystemC, generating RTL automatically from the C code.
Designing a chip in a nutshell:
Before talking about the benefits of HLS, let’s quickly touch the a few steps required for designing a chip:
First, the product group defines the needs. Then, architects write the specifications for the chip. Next, front-end designers code according to the specifications, and verification engineers test the code. After that, back-end designers turning the RTL into gates (the netlist) and lay them out in physical places. Between each of these steps, multiple checks are done for timing, power, leakage, aging and many other things. It’s an iterative process until a GDSII file is ready to send a fab for tapeout.
Challenges with Today’s Chip-Making Flow:
There are certainly some challenges that arise with the chip-making flow. Some of the challenges are as follows:
Architecture: The architecture phase is the critical phase. With bad definitions, the power, performance, and area (PPA) could greatly affect the end result. No matter how good designers are, bad definitions result in a mediocre chip. Many architects are using “advanced tools” such as PowerPoint, Excel, Visio and Acrobat Writer. Unfortunately, those advanced tools don’t let them test the performance and decide on the optimal building blocks such as pipeline depth and other elements.
Front-end design: Coding RTL requires time and effort. Once the code is ready, infrastructure changes are time-consuming and prone to human error. Also, the longer the code is, the higher the probability for human errors and misunderstandings of the spec. Cleaning the design at this phase would exponentially reduce the amount of work required to catch bugs and fix them.
Implementation: Synthesizing RTL design into gates is the beginning of the journey that allows the timing checks. As soon as the design turns into gates and wires, one can see if it can really work. With particularly challenging designs, this can mean extra iterations to fix the RTL code and enable meeting timing.
Verification: The development of the testing infrastructure starts in conjunction with the design and continues until the tapeout date and beyond. Most companies are testing the RTL code, and some are also testing the netlist and the netlist with timing information. The longer the RTL code is, the harder it is to track and debug.
The High-Level Solution in a Nutshell
I’ll talk about the Cadence high level flow, Stratus HLS. It accepts C++/IEEE 1666 SystemC code and generates correct RTL. It’s as simple as that. It can create the netlist as well.
Using implementation tools under the hood allows debugging implementation issues even before the RTL gets generated.
The big difference in coding is the separation of the intent, “the what” from the implementation, “the how”. So first, you just write the algorithms, test them, and then you add the constraints that specify the micro architecture i.e. pipeline, no pipeline, memories, flops, etc.
In addition, Cadence has Rapid Adoption Kits (RAKs). There are many ready-made RAKs, but the one that got a lot of traction recently was the TensorFlow to RTL.
How can HLS assist?
Every company defines its own methodology. Let’s touch on one example to walk through the benefits, and you can map this example to your own project/company.
Architects can take SystemC code and actually test their assumptions. When they are happy, those files become the golden models for designers and verification folks. Couple of examples: Imaging using 18-bit fixed point vs 14 bit-floating point, or, pipeline depth of five compared to a depth of two. Such examples could have a great impact on power, performance and area and being able to test them quickly both shortens the design and definition cycles and leads to a better product.
As the design intent—again, “the what”—and the implementation—again, “the how”—are separated in HLS, multiple implementations could be created from same golden code. Both architects and designers can do multiple iterations with a click of a button to choose “How to map” the same C code into optimal RTL implementation (By defining constraints and map: Memory, Flops, Pipeline depths and many other parameters). This is simply not feasible when hand-coding in RTL.
The designers can add right on top of the golden code extensions that are required to describe the next level of details. Having designs coded at the intent level produces RTL and later gates that are correct-by-construction, reducing the number of bugs dramatically.
Testing the C code is MUCH faster compared to testing on RTL. There’s a great methodology in place to define what should be tested within each level of abstraction. The type of issues one has to debug is reduced mainly to a timing issue as most of the other bugs are solved with the correct-by-construction design.
Finally, the tools today have synthesis and power engines under the hood. This means that synthesis can converge faster because timing issues have already been addressed in the C to RTL level. Designers can actually test multiple design styles with a click of a button rather than “guestimating”.
A quick glance into the cockpit shows the amount of information you can get on your design, before you decide to finalize an RTL version.
In the above image, the upper windows show how the C code is linked to the graph view of the design. You can easily see the design flow and loops at a glance and then dig into the code.
On the bottom right, you can see the estimated size of the various resources, the level of resource-sharing and how much margin each resource would have for timing closure after synthesis.
The bottom right window compares histograms for multiple different runs, enabling engineers to choose the best microarchitecture.
This is the tip of the iceberg, but with so much information prior to synthesis, designers can optimize their design for synthesis, power and area starting from the C-level.
Barriers for adoption:
Around the beginning of the century, HLS technology was not mature enough.
The primary flaws used to be a lack of a good ECO process and a lack of a good way to describe control-oriented designs.
We all know that changes are required not only in the initial design phase, but also through bug hunting, implementation and iterations leading up to tapeout. This is no longer the case, as there’s a mature proven ECO flow. And same goes for control-based designs, as most of the designs mix control with datapath.
The next barrier is a political one within organizations. Who pulls the strings? Architects? Designers? Project managers? Verification engineers? Implementation engineers? Whose voice is heard the most in your company?
Pieces of the methodology can be implemented in companies that have a culture of openness, collaboration and a unified goal to improve. But, for the ultimate result, it sometimes requires breaking walls between technical departments. The primary technology user is the designer, but depending on how the technology is adopted, the cheese might be moving for architects as well as verification engineers.
The next barrier is the ramp-up phase. From my experience, ANY company that was after faster time-to-market and engaged with HLS technology ended up adopting it. When comparing coding RTL from scratch to learning the technology and coding with IEEE 1666 SystemC, we noticed that the time to code and verify the first block is about the same as if you let a designer code RTL or a peer run down the HLS path for the first time.
But, from that point on, the coding of new blocks and re-use is significantly more productive using HLS, and, in most cases, better results are achieved utilizing this technology.
Note, not everyone is open to allow a tool to make design decisions. Old school designers are fanatic about coding every bit and sometimes with design pessimism. When you put such a designer in front of high-level C coding, you might still get pessimistic coding of every bit… using C instead of RTL, but that wouldn’t move the needle. It’s important to be open-minded and trusting because the more freedom the tool has to make decisions, the better result you can get. So one can have the tools, but how to use them is a different story...
The last item on my short list is the design type. Most designs have both data flow parts as well as control parts. The HLS technology can deal with both aspects of the design, but the more freedom it has, the better the optimizations that can be done—far beyond what the average engineer could think of. The more data flow your design is, the less specific you need to be when coding and the easier it is for you to get optimizations done.
The above chart shows a recent breakdown of design types that successfully used HLS. Notice that it is a mixture of designs that have both data-path as well as control nature
To Sum It All Up
The HLS technology can significantly shorten the time to tapeout. You’ll hear buzz phrases like 10X improved productivity and 50% shorter time to product, which all have solid use cases behind them, but in order to really understand the value for you, you might have to simply try it. Bear in mind, it takes time and effort, but you can achieve great rewards.
CAE
2 年It's a good article, is there any solution about the spec's integrity check before RTL or system C?
Senior Product Engineer at Cadence Design Systems
4 年Good article Ronen. I'd like to defend the old-school hardware guy a bit. I see HLS as the automation of RTL design producing better quality RTL than the one written by hand. Everyone who considers the adoption of HLS methodology must understand that the tool is not a magic box that always gives users the right micro-architecture and QoR within specs. It takes some old-school hardware design / architecture skills and knowledge to achieve good results. It is very naive to think that a useful outcome is accessible to engineers that don't know or don't care about established RTL design methodologies. The old-school RTL designer may find HLS very useful precisely because it allows to look after even a small details in the design. The difference is that the detailed optimization is no longer a manual process - it is automated. That offers the time and space to produce RTL of excellent quality.
Great article Ronen. HLS has come a long way in the last 2 decades and I think many designers out there would be surprised at the number of commercial products that they (and everyone in their family) use every day that rely heavily on HLS for implementation in silicon. Hopefully, your article helps educate people and make that impact more obvious.
Sr. R&D Engineer at Synopsys | BTech ECE @ MEC
4 年Ronen Laviv, thank you for putting up such a wonderful article on the #HLS design flow. As a student I have had my fair share of experience with the #Vivado #HLS environment. I thus would like to add a few points too. I have been really impressed by the ease at which I have been able to code complex designs on #FPGAs. HLS design flows are definitely useful but I still feel that it lacks the level of granularity/control that #verilog or #systemverilog provides. Your thoughts on this please?.
UVM Guru (10+ yrs) | PCI Express Jedi (18+ yrs) | Taming cutting-edge SoC design and verification problems #verification #engineer
4 年I worked with HLS and Stratus for a couple of years. The stuff I could do with a few thousand lines of SystemC was just stunning. I especially liked the part where I could simulate with C++ "firmware" and AXI BFM's, then with a script, move that same firmware to my FPGA board to run on an MCU. There was a pretty good learning curve. There were tradeoffs. No matter how you slice it, on a large project you'll likely be using UVM for DV and having to manage a mixed verification flow and that's going to be the bulk of your time. The thing that really would help though -- a graphical tool for creating the classes and connecting them that spit out a syntax correct framework so I could focus on the desired behavior rather than the plumbing. That or an AI tool to assist with the basic plumbing. C++ is not the worlds greatest language for hardware design. No argument, but there is nothing anyone can do about it right now. Get over it -- its what we got. That horse left the starting gate 20 years ago and I don't have 20 years left in my career to wait for something better.