RTL Coding in FPGA

Module designers shall have detailed view of the design down to function/major component level for near-accurate estimates. At the end of this phase, exact FPGA part to be used shall be finalized from the chosen family.The following are critical aspects which need to be considered during RTL coding phase:

??1.?Logic delay: Though it may be adequate to maintain logic delay of around 50%, it is desirable to maintain high speed paths in the design lower than that, say to 20-30%. Usually there are abundant resources such as Flip Flops (normally 1 flip flop for each look-up table), RAMs, and Multipliers etc. Wherever it doesn’t affect throughput, additional pipeline stages can be introduced judiciously keeping in mind the routing congestion issues.

2.??????Device mapping efficiency: The RTL code shall enable best FPGA mapping by exploring the device architecture.. If a 4:1 MUX is coded as single entity, it will map well in one slice with 2 LUTs and an F5 MUX. Instead if 4:1 MUX built with pipelining after 2:1 MUX, then it can’t be mapped to F5 MUX and additional slice is needed. Another example is long register based shift register can be mapped to SRL configuration of LUT, provided all these registers need not have reset.

3.??????Fan-out: Though synthesis tools can do automatic fan-out control, manual control is needed especially for the signals interfacing to hard-macros, as tools will treat every thing in same manner and often they are black-boxes.

4.??????Vendor specific structures and instantiations: Create hierarchy around them to give freedom to migrate from one technology to another.

5.??????Macro interface: All the inputs/outputs of macros shall be registered due to their fixed locations.

6.??????Gated clocks: Avoid gated clocks and use clock enables instead.

7.??????Critical logic: Place critical logic in separate hierarchy

8.??????Critical paths: Make sure that they are not crossing hierarchy of the block by registering all the outputs.

9.??????Tri-state buffers: For low speed paths, it is desirable to use tri-state buffers to save logic cells

10.??Unused hard-macros: Unused RAMs can be used as register set or to map state machines coded as look up tables. This will also avoid large multiplexers in the read path. Also unused multipliers can be used as long shifters.

11.??False and multi-cycle paths: False and multicycle paths shall not be pipelined and shall be identified by design and pass on to synthesis tool.

12.??Trail synthesis and P&R: Each module level designer shall perform individual module level synthesis and P&R of the design with the given floorplan and optimize the RTL code while being developed. If the IO requirement of a module exceeds the device physical IOs, dummy logic can be added to demultiplex/miltiplex few-pins-to-more-pins and/or more-pins-to-few-pins using shift register structures and/or OR-gate structure as shown in Figure 2. Also as shown in this figure insert additional flip-flops on interfaces to selected module to other modules by leaving actual IO interfaces same. This will eliminate skewed timing results due to dummy logic and connections. Also black-box timing information shall be used during synthesis to avoid skewed timing results.

13.??Module level Floorplanning: With-in the given floorplan area, often it is desirable to do sub-module level floorplanning. In this submodule level floorplanning it is often necessary to do floorplan only for critical parts of the design. Also it is necessary to do individual synthesis compile of timing critical sub-modules being floorplanned which will prevent hierarchy loss (as shown in Figure 3), and there-by ineffcient placement.

14.??Logic compression: Though from area standpoint it is preferred to do maximum level packing of unrelated logic (for example using COMPRESSION with Xilinx flow), it will have adverse impact on timing. Thus unrelated logic packing level shall be set based on timing criticality of each sub-module.

15.??IO allocation: The respective module IO fixing shall be done based on IO ring pin sequence on the die rather than pin sequence on the package.

?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了