Bespoke processors

Bespoke processors are used for applications with ultra-low area and power constraints. Low-power processors are widely-used and are expected to power a large number of emerging applications. Such processors tends to be simple, run relatively simple applications, and do not support non-determinism which makes symbolic simulation-based technique a good fit for such processors.

Bespoke processor design that tailors a general purpose processor IP to a target application by removing all gates from the design that can never be used by the application. A bespoke processor, tailored to a target application, must be functionally-equivalent to the original processor when executing the application. A bespoke implementation of a processor design should retain all the gates from the original processor design that might be needed to execute the application. Any gate that could be toggled by the application and propagate its toggle to a state element or output port performs a necessary function and must be retained to maintain functional equivalence.

Conversely, any gate that can never be toggled by the application can safely be removed, as long as each fan-out location for the gate is fed with the gate’s constant output value for the application. Removing constant gates for an application could result in significant area and power savings and, unlike conventional energy saving techniques, will introduce no performance degradation.

 

Bespoke processors can achieve significantly lower area and power than their general purpose counterparts without any performance degradation since removed gates are never used by an application. In addition, gate removal can expose additional timing slack that can be exploited to increase area and power savings or performance of a bespoke design. Bespoke processor design reduces area and power by 62% and50%, on average, while exploiting exposed timing slack improves average power savings to 65%.

Area and power-constrained microprocessors and microcontrollers are the most abundant type of processor produced and used today, with projected deployment growing exponentially in the near future. This explosive growth is fueled by the emerging area- and power-constrained applications, such as the internet-of things, wearables, implantable, and sensor networks.

The microprocessors and microcontrollers used in these applications are designed to include a wide variety of functionalities in order to support a large number of diverse applications with different requirements. On the other hand, the embedded systems designed for these applications typically consist of one application or a small number of applications, running over and over on a general purpose processor for the lifetime of the system. Given that a particular application may only use a small subset of the functionalities provided by a general purpose processor, there may be a considerable amount of logic in a general purpose processor that is not used byan application.

Cost concerns drive many of the above applications to use general-purpose microprocessors and microcontrollers instead of much more area- and power-efficient ASICs, since, among other benefits, development cost of microprocessor IP cores can be amortized by the IP core licensor over a large number of chip makers and licensees. In fact, ultra-low-area- and power-constrained microprocessors and microcontrollers powering these applications are already the most widely used type of processing hardware in terms of production and usage in spite of their well-known inefficiency compared to ASIC and FPGA-based solutions. Given this mismatch between the extreme area and power constraints of emerging applications and the relative inefficiency of general-purpose microprocessors and microcontrollers compared to their ASIC counterparts, there exists a considerable opportunity to make microprocessor-based solutions for these applications much more area- and power-efficient. One big source of area inefficiency in a microprocessor is that a general purpose microprocessor is designed to target an arbitrary application and thus contains many more gates than what a specific application needs. Also, these unused gates continue to consume power, resulting in significant power inefficiency. While adaptive power management techniques (e.g., power gating help to reduce power consumed by unused gates, the effectiveness of such techniques is limited due to the coarse granularity at which they must be applied, as well as significant implementation overheads such as domain isolation and state retention. These techniques also worsen area inefficiency.


要查看或添加评论,请登录

Sampath VP的更多文章

  • Deeplearning what does it fits?

    Deeplearning what does it fits?

    There is a huge enthusiasm for cognitive computing , artificial intelligence , machine learning , deep learning and…

  • Switch to Stand out

    Switch to Stand out

    Bharath Semiconductor Society which emphasis on the ESDM, Semiconductor, entrepreneurship,MSMEs and academics. Since…

    1 条评论
  • ASIC RTL vs FPGA RTL

    ASIC RTL vs FPGA RTL

    The biggest difference between RTL design for ASIC and RTL design for FPGA is that ASICs are custom-designed integrated…

  • DV TALK 31ST AUGUST DONT MISS IT!

    DV TALK 31ST AUGUST DONT MISS IT!

    DV TALK Greetings from Bharath Semiconductor society.Bharath Semiconductor Society of India was established in 2022 as…

    3 条评论
  • RTL Coding in FPGA

    RTL Coding in FPGA

    Module designers shall have detailed view of the design down to function/major component level for near-accurate…

  • Transaction layer of PCIe

    Transaction layer of PCIe

    Transaction layer Transaction layer’s primary responsibility is to create PCI Express request and completion…

  • Deep learning designs

    Deep learning designs

    DL designs for training can be a large size due to the shear amount of high precision MACs, memory routing, and…

  • PCIe Equalization phases

    PCIe Equalization phases

    Equalization is a critical aspect of PCIe technology that ensures the integrity of data transmission in increasingly…

  • PCIe Equalization

    PCIe Equalization

    · PCIe 3.0: Gen 3 introduced static equalization, primarily performed by the transmitter using 128/130 encoding.

  • PCIe Enumeration

    PCIe Enumeration

    PCIe enumeration is the process of detecting the devices connected to the PCIe bus. switches and endpoint devices are…

社区洞察