FPGA Congestion

Project Everest devices, where a network-on-chip will help reduce congestion across the devices, which will deploy billions of transistors through the use of a 7nm process. Plunify’s algorithms have found cases where moving logic out of hardwired blocks can improve overall timing.Possibly it can reduce the congestion caused by the need to dedicate routing to connect to the fixed-location cores. A common technique for reducing congestion overall is to focus on the Rent exponent – a measure of the number of connections each block within the design needs.

Tool reports provide connectivity measurements for each block and the optimisation tools can focus effort on elements that tend to increase the Rent exponent.These strategies reduce congestion by selectively reducing the utilisation of structures that tend to increase Rent and congestion.

During the training phase, with several HLS-based applications, we run one time of the complete C-to-FPGA flow to obtain the routing congestion metrics. The C/C++ specifications of designs are synthesized into RTL models through the HLS flow, and then the RTL descriptions run through the implementation flow to generate the congestion metrics. With the t model, the highly congested regions in the source code of the target design can be detected during the prediction phase and users can resolve congestion issues in the HLS flow without running the time-consuming RTL implementation flow.

Designs that have high utilization will be more susceptible to congestion. Generally, a design that is over 80% utilized (Slice LUTs) will become very difficult to route and meet timing. The actual percentage where this difficulty is seen is highly design dependent, and is affected by such factors as the number of control sets (clock, reset, and enable) in the design, and high fanout nets. Also, over-utilization of other site types such as FFs, LUTRAMs, block RAMs, and DSP sites can also lead to congestion. 

One suggestion to overcome such congestion would be to balance the utilization of different types of sites. For instance, if many LUTRAMs are causing an over-utilization of LUTs, then moving some of these to block RAM sites could help. 

report_high_fanout_nets - Finding high fanout nets can be crucial in fighting congestion. Specifically, non-clock control signals that have a high fanout can cause congestion. You can make a list of high fanout nets with synchronous drivers and use the following command to relieve congested areas.

phys_opt_design -force_replication_on_nets

If a high fanout net is driven from a LUT and cannot be replicated with phys_opt_design, then this can either be manually replicated in the RTL, or a global buffer (BUFG) can be added if the added delay is acceptable.

report_design_analysis - The report_design_analysis -complexity command can also be used to see if a design is more complex and susceptible to congestion. 

report_design_analysis -help gives additional information on how to use this. From the IDE, navigate to

Tools -> Report -> Report Design Analysis

Example command:

report_design_analysis -congestion -complexity -hierarchical_depth 10

This will analyze the complexity "Rent" and specify this for 10 hierarchical levels. The -congestion option will also list the most heavily utilized routing tiles.

 report_qor_suggestions - Running the report_qor_suggestions command on a routed design can give valuable feedback on constraint, Tcl, and design changes that can help with congestion problems. The focus on the command is QOR improvements relating to timing critical paths, but there are congestion specific suggestions, when specific congestion scenarios are detected.  

 Logic related to Congestion - Find the logic that occupies the congested tiles by building a schematic. This logic can be checked for connecting high-fanout nets directly related to this congestion. 

Local vs Global Congestion - Congestion can be local to a certain region of the device, even when the overall device utilization is low. In this case, certain restrictions such as I/O connections or area constraints such as pblocks can cause the congestion and should be checked. Try loosening or removing pblock constraints to see how the results differ.


要查看或添加评论,请登录

Sampath VP的更多文章

  • Deeplearning what does it fits?

    Deeplearning what does it fits?

    There is a huge enthusiasm for cognitive computing , artificial intelligence , machine learning , deep learning and…

  • Switch to Stand out

    Switch to Stand out

    Bharath Semiconductor Society which emphasis on the ESDM, Semiconductor, entrepreneurship,MSMEs and academics. Since…

    1 条评论
  • ASIC RTL vs FPGA RTL

    ASIC RTL vs FPGA RTL

    The biggest difference between RTL design for ASIC and RTL design for FPGA is that ASICs are custom-designed integrated…

  • DV TALK 31ST AUGUST DONT MISS IT!

    DV TALK 31ST AUGUST DONT MISS IT!

    DV TALK Greetings from Bharath Semiconductor society.Bharath Semiconductor Society of India was established in 2022 as…

    3 条评论
  • RTL Coding in FPGA

    RTL Coding in FPGA

    Module designers shall have detailed view of the design down to function/major component level for near-accurate…

  • Transaction layer of PCIe

    Transaction layer of PCIe

    Transaction layer Transaction layer’s primary responsibility is to create PCI Express request and completion…

  • Deep learning designs

    Deep learning designs

    DL designs for training can be a large size due to the shear amount of high precision MACs, memory routing, and…

  • PCIe Equalization phases

    PCIe Equalization phases

    Equalization is a critical aspect of PCIe technology that ensures the integrity of data transmission in increasingly…

  • PCIe Equalization

    PCIe Equalization

    · PCIe 3.0: Gen 3 introduced static equalization, primarily performed by the transmitter using 128/130 encoding.

  • PCIe Enumeration

    PCIe Enumeration

    PCIe enumeration is the process of detecting the devices connected to the PCIe bus. switches and endpoint devices are…

社区洞察

其他会员也浏览了