The demands of future computing, as well as the challenges of nanometer-era VLSI design, will require new design techniques and design styles that are simultaneously high-performance, energy-efficient, and robust to noise and process variation. One of the emerging problems concerns the communication mechanisms between the increasing number of blocks, or cores, that can be integrated onto a single chip. The bus-based systems and point-to-point interconnection strategies in use today cannot be easily scaled to accommodate the large numbers of cores projected in the near future. Network-on-chip (NoC) interconnect infrastructures are one of the key technologies that will enable the emergence of many-core processors and systems-on-chip with increased computing power and energy efficiency. This dissertation is focused on testing, yield improvement and fault-tolerance of such NoC infrastructures.
The motivation for the work is that, with technology scaling into the nanometer range, defect densities will become a serious challenge for fabrication of integrated circuits counting billions of transistors. Manufacturing these systems in high volumes can only be possible if their cost is low. The test cost is one of the main components of the total chip cost. However, relying on post-manufacturing test alone for guaranteeing that ICs will operate correctly will not suffice, for two reasons: first, the increased fabrication problems that are expected to characterize upcoming technology nodes will adversely affect the manufacturing yield, and second, post-fabrication faults may develop due to electromigration, thermal effects, and other mechanisms. Therefore, solutions must be developed to tolerate faults of the NoC infrastructure, as well as of the functional cores.
In this dissertation, a fast, efficient test method is developed for NoCs, that exploits their inherent parallelism to reduce the test time by transporting test data on multiple paths and testing multiple NoC components concurrently. The improvement of test time varies, depending on the NoC architecture and test transport protocol, from 2X to 34X, compared to current NoC test methods. This test mechanism is used subsequently to perform detection of NoC link permanent faults, which are then repaired by an on-chip mechanism that replaces the faulty signal lines with fault-free ones, thereby increasing the yield, while maintaining the same wire delay characteristics. The solution described in this dissertation improves significantly the achievable yield of NoC inter-switch channels – from 4% improvement for an 8-bit wide channel, to a 71% improvement for a 128-bit wide channel. The direct benefit is an improved fault-tolerance and increased yield and long-term reliability of NoC-based multicore systems.