China Tunes Neural Networks for Custom Supercomputer Chip
Teams in China working on the top performing supercomputer in the world, the Sunway TaihuLight machine with its custom processor, have shown that their optimizations for theSW26010 architecture on deep learning models have yielded a 1.91-9.75X speedup over a GPU accelerated model using the Nvidia Tesla K40m in a test convolutional neural network run with over 100 parameter configurations.
Efforts on this system show that high performance deep learning is possible at scale on a CPU-only architecture. The Sunway TaihuLight machine is based on the 260-core Sunway SW26010, which we detailed here from both a chip and systems perspective. The …
Read more