FPGA Congestion
Sampath VP
ASIC/FPGA Design Professional | SoC Architecture | Technology Evangelist | IEEE Reviewer|
Project Everest devices, where a network-on-chip will help reduce congestion across the devices, which will deploy billions of transistors through the use of a 7nm process. Plunify’s algorithms have found cases where moving logic out of hardwired blocks can improve overall timing.Possibly it can reduce the congestion caused by the need to dedicate routing to connect to the fixed-location cores. A common technique for reducing congestion overall is to focus on the Rent exponent – a measure of the number of connections each block within the design needs.
Tool reports provide connectivity measurements for each block and the optimisation tools can focus effort on elements that tend to increase the Rent exponent.These strategies reduce congestion by selectively reducing the utilisation of structures that tend to increase Rent and congestion.
During the training phase, with several HLS-based applications, we run one time of the complete C-to-FPGA flow to obtain the routing congestion metrics. The C/C++ specifications of designs are synthesized into RTL models through the HLS flow, and then the RTL descriptions run through the implementation flow to generate the congestion metrics. With the t model, the highly congested regions in the source code of the target design can be detected during the prediction phase and users can resolve congestion issues in the HLS flow without running the time-consuming RTL implementation flow.
Designs that have high utilization will be more susceptible to congestion. Generally, a design that is over 80% utilized (Slice LUTs) will become very difficult to route and meet timing. The actual percentage where this difficulty is seen is highly design dependent, and is affected by such factors as the number of control sets (clock, reset, and enable) in the design, and high fanout nets. Also, over-utilization of other site types such as FFs, LUTRAMs, block RAMs, and DSP sites can also lead to congestion.
One suggestion to overcome such congestion would be to balance the utilization of different types of sites. For instance, if many LUTRAMs are causing an over-utilization of LUTs, then moving some of these to block RAM sites could help.
report_high_fanout_nets - Finding high fanout nets can be crucial in fighting congestion. Specifically, non-clock control signals that have a high fanout can cause congestion. You can make a list of high fanout nets with synchronous drivers and use the following command to relieve congested areas.
phys_opt_design -force_replication_on_nets
If a high fanout net is driven from a LUT and cannot be replicated with phys_opt_design, then this can either be manually replicated in the RTL, or a global buffer (BUFG) can be added if the added delay is acceptable.
report_design_analysis - The report_design_analysis -complexity command can also be used to see if a design is more complex and susceptible to congestion.
report_design_analysis -help gives additional information on how to use this. From the IDE, navigate to
Tools -> Report -> Report Design Analysis.
Example command:
report_design_analysis -congestion -complexity -hierarchical_depth 10
This will analyze the complexity "Rent" and specify this for 10 hierarchical levels. The -congestion option will also list the most heavily utilized routing tiles.
report_qor_suggestions - Running the report_qor_suggestions command on a routed design can give valuable feedback on constraint, Tcl, and design changes that can help with congestion problems. The focus on the command is QOR improvements relating to timing critical paths, but there are congestion specific suggestions, when specific congestion scenarios are detected.
Logic related to Congestion - Find the logic that occupies the congested tiles by building a schematic. This logic can be checked for connecting high-fanout nets directly related to this congestion.
Local vs Global Congestion - Congestion can be local to a certain region of the device, even when the overall device utilization is low. In this case, certain restrictions such as I/O connections or area constraints such as pblocks can cause the congestion and should be checked. Try loosening or removing pblock constraints to see how the results differ.