Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …
applications. Accelerating their training is a major challenge and techniques range from …
There's plenty of room at the Top: What will drive computer performance after Moore's law?
BACKGROUND Improvements in computing power can claim a large share of the credit for
many of the things that we take for granted in our modern lives: cellphones that are more …
many of the things that we take for granted in our modern lives: cellphones that are more …
[BOOK][B] Parallel computer architecture: a hardware/software approach
The most exciting development in parallel computer architecture is the convergence of
traditionally disparate approaches on a common machine structure. This book explains the …
traditionally disparate approaches on a common machine structure. This book explains the …
Domain-specific hardware accelerators
Domain-specific hardware accelerators Page 1 48 COMMUNICATIONS OF THE ACM | JULY
2020 | VOL. 63 | NO. 7 contributed articles FROM THE SIMPLE embedded processor in your …
2020 | VOL. 63 | NO. 7 contributed articles FROM THE SIMPLE embedded processor in your …
Optimization of collective communication operations in MPICH
R Thakur, R Rabenseifner… - The International Journal …, 2005 - journals.sagepub.com
We describe our work on improving the performance of collective communication operations
in MPICH for clusters connected by switched networks. For each collective operation, we …
in MPICH for clusters connected by switched networks. For each collective operation, we …
[PDF][PDF] Reining in the outliers in {Map-Reduce} clusters using mantri
G Ananthanarayanan, S Kandula… - … USENIX Symposium on …, 2010 - usenix.org
Experience from an operational Map-Reduce cluster reveals that outliers significantly
prolong job completion. e causes for outliers include run-time contention for processor …
prolong job completion. e causes for outliers include run-time contention for processor …
[BOOK][B] Patterns for parallel programming
TG Mattson, B Sanders, B Massingill - 2004 - books.google.com
The Parallel Programming Guide for Every Software Developer From grids and clusters to
next-generation game consoles, parallel computing is going mainstream. Innovations such …
next-generation game consoles, parallel computing is going mainstream. Innovations such …
Learning detailed face reconstruction from a single image
Reconstructing the detailed geometric structure of a face from a given image is a key to
many computer vision and graphics applications, such as motion capture and reenactment …
many computer vision and graphics applications, such as motion capture and reenactment …
Versatile, scalable, and accurate simulation of distributed applications and platforms
The study of parallel and distributed applications and platforms, whether in the cluster, grid,
peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of …
peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of …
[BOOK][B] Data-intensive text processing with MapReduce
Our world is being revolutionized by data-driven methods: access to large amounts of data
has generated new insights and opened exciting new opportunities in commerce, science …
has generated new insights and opened exciting new opportunities in commerce, science …