Communication delay analysis of fault-tolerant pipelined circuit switching in torus

作者:

Highlights:

摘要

Large-scale parallel systems, Multiprocessors System-on-Chip (MP-SoCs), multicomputers, and cluster computers are often composed of hundreds or thousands of components (such as routers, channels and connectors) that collectively possess failure rates higher than what arise in the ordinary systems. One of the most important issues in the design of such systems is the development of the efficient fault-tolerant mechanisms that provide high throughput and low latency in communications to ensure that these systems will keep running in a degraded mode until the faulty components are repaired. Pipelined Circuit Switching (PCS) has been suggested as an efficient switching method for supporting inter-processor communications in networks due to its ability to preserve both communication performance and fault-tolerant demands in such systems. This paper presents a new mathematical model to investigate the effects of failures and capture the mean message latency in torus using PCS in the presence of faulty components. Simulation experiments confirm that the analytical model exhibits a good degree of accuracy under different working conditions.

论文关键词:Large-scale parallel systems,Fault-tolerance,PCS,Torus,Adaptive routing,Virtual channels,Message latency,Queuing theory,Performance evaluation

论文评审过程:Received 3 October 2005, Revised 11 March 2006, Available online 24 February 2007.

论文官网地址:https://doi.org/10.1016/j.jcss.2007.02.003