The demise of Moore’s Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators.
Verification and testing of this hardware heavily uses cycle-accurate simulation of register-transfer-level (RTL) designs.
The best software RTL simulators can simulate designs at 1–1000~kHz, i.e., more than three orders of magnitude slower than hardware.
Faster simulation can increase productivity by speeding design iterations and permitting more exhaustive exploration.
One possibility is to use parallelism as RTL exposes considerable fine-grain concurrency.
However, state-of-the-art RTL simulators generally perform best when single-threaded since modern processors cannot effectively exploit fine-grain parallelism.
At VLSC, we have designed Manticore. An architecture built-from ground-up to exploit fine-grained parallelism of RTL simulation. Manticore is optimized for datacenter FPGAs in physical design and has a specialized compiler that optimizes and parallelize RTL code on a grid of few hundreds of simple processors.
See our pre-print for more:
Mahyar Emami, Sahand Kashani, Keisuke Kamahori, Mohammad Sepehr Pourghannad, Ritik Raj, and James R. Larus. “Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism.” arXiv e-prints (2023): arXiv-2301.