Processor Architectures hitting the Glass Ceiling

10 years ago, processor development ran into the frequency barrier and, almost simultaneously, superscalarity reached its limitation. Since then, both, frequency and ILP of a processor cores stalled.

 

 

 

 

Over the past decade exploiting thread level parallelism by increasing the number of processor cores became the driving factor for processor development. Only to face another barrier: Amdahl’s law will limit thread parallelism for many applications:

 

 

 

Hyperion’s novel Architecture

Hyperion-Core's novel architecture is a milestone in processor development. It introduces Programmable Loop Acceleration (LA/PLA) in combination with an entirely new method for Out-of-Order processing. Hyperion-Core outperforms existing processors in both, processing performance and low power dissipation. At the same time, Hyperion-Core implements a slim and efficient RISC-type Instruction Set Architecture and is fully programmable using high level languages such as C/C++.

 

The Hyperion-Core core is capable of operating in different execution modes. At runtime the mode best suited for a to be executed algorithm is dynamically selected. At present one of the following modes can be selected: RISC/VLIW, Loop-Acceleration, Asynchronous and Out-of-Order.

 

 

RISC/VLIW-Mode

The Hyperion-Core operates like a classic RISC processor. Instructions are fetched, issued and executed in-order by a number of Execution Units.

 

 

Loop-Acceleration-Mode

In average, about 50% of an application's processing time is spent in software loops:

 

 

In loop acceleration mode, during the first iteration the Hyperion-Core operates similar to RISC/VLIW-mode: instructions are fetched, issued and executed in order. Yet, the instructions are issued to an array of Execution Units. From the second iteration on, the before issued instructions become static in the array. Execution Units are directly interconnected within the array, bypassing the register file. Block of data can be streamed fully pipeline through the array. The processor operates at maximum Instruction Level Parallelism (ILP) in conjunction with minimal power dissipation.

 

The figures show the power dissipation of standard processors:

 


 

While operating as Loop Accelerator Hyperion-Core stops fetching, decoding and issuing each individual instruction. Instead, instructions which are iterated, remain temporarily static and therefore are not issued again. The units controlling the instruction flow are switched-off, not consuming energy:

 

 

 

 

Asynchronous-Mode

Many algorithms for the Internet-of-Things are small and operate only sporadically. Examples are heating control or hard beat measurement. Like in Loop-Acceleration-Mode those algorithms are issued once to the array of Execution Units and then remain static. The clock frequency is reduced to a minimum, pipeline stages are switched off. The Hyperion-Core becomes an ultra-low-power virtual fixed-function chip, while retaining the capability for switching an instant later back into a “normal” operation mode for communication with a host or maintaining a user interface.

 

 

Out-of-Order-Mode

The Hyperion-Core Architecture goes entirely new ways implementing Out-of-Order processing. Instead of having the state-of-the-art overhead for Reservation Stations, Reorder Buffer and the like, Hyperion’s array of Execution Units unifies all functions: Instructions are in-order fetched and issued to the array. Issued instructions remain static until all operands are available within the array – then they are executed out-of-order. Depending on the array size a significantly larger Instruction Window is achievable and, at the same time, the theoretical Instruction Level Parallelism increases to the number of Execution Units in the array having their operands available.

 

Hyperion-Core: Design and Software optimized

 

 

Disclaimer: Intel, AMD, MIPS, DEC and ARM are trademarks of the respective company or trademark holder.