A performace evaluation of three supercomputers, Fujitsu VP-200, Hitachi S810/20 and the Cray X-MP/24

摘要

The Los Alamos benchmark set has been executed on two Japanese supercomputers, Fujitsu's VP-200 and Hitachi's S810/20, as well as on the American-made CRAY X-MP/24. The benchmark set is a collection of CPU intensive codes that approach benchmarking at three levels: (1) Timing of elementary vector operations whose analysis provides basic data about the vector architecture and compiler (2) Timing of characteristic code excerpts (1000 to 3000 FORTRAN lines). These codes vary from linear equation solvers to physics simulations of particle transport and hydrodynamics. (3) Timing of characteristic real codes (10,000 to 20,000 lines of FORTRAN), some of which have been partitioned for execution on multiprocessor architectures. These codes were not used in this benchmark.The time alloted to the evaluations was one week per machine. The codes were run in two modes: (1) original mode in which only changes necessary to execute the codes correctly were allowed and (2) “tuned” mode in which minor FORTRAN changes or compiler directives were allowed. All the benchmarks (even the CRAY X-MP/24) were single processor tests. No I/O or throughput measurements were made. The results of the benchmarks can be analyzed in terms of scalar speed, raw vector speed, and, finally, overall performance. The scalar speed of the VP-200 and X-MP/24 were roughly comparable. Scalar timings differed by, at most, 30 percent, and we concluded that differences of this magnitude could be overcome by a couple of versions of compiler changes. Hitachi's S810/20, however, was a factor of 2 slower in scalar performance. The vector speed of the machines are viewed at two different ranges of vector length. The VP-200 and CRAY X-MP performed equally on short vectors (vector length 10), while the S810 was 50 percent slower. On long vectors (1000 length), the Japanese machines outperformed the X-MP by a factor of 2 or 3. Final conclusions about the machines, however, are based on the overall performance of codes that contain a mixture of scalar and vector modes. In this regime, Amdahl's Law applies and raw vector speed is rarely the determining factor. On the majority of our benchmark set and, particularly, on the more important codes, the VP-200 and an X-MP/24 single processor performed comparably. The Hitachi S810/20 was generally a factor of 2 slower.