Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

There's plenty of room at the Top: What will drive computer performance after Moore's law?

CE Leiserson, NC Thompson, JS Emer, BC Kuszmaul… - Science, 2020 - science.org
BACKGROUND Improvements in computing power can claim a large share of the credit for
many of the things that we take for granted in our modern lives: cellphones that are more …

[BOOK][B] Parallel computer architecture: a hardware/software approach

D Culler, JP Singh, A Gupta - 1999 - books.google.com
The most exciting development in parallel computer architecture is the convergence of
traditionally disparate approaches on a common machine structure. This book explains the …

Domain-specific hardware accelerators

WJ Dally, Y Turakhia, S Han - Communications of the ACM, 2020 - dl.acm.org
Domain-specific hardware accelerators Page 1 48 COMMUNICATIONS OF THE ACM | JULY
2020 | VOL. 63 | NO. 7 contributed articles FROM THE SIMPLE embedded processor in your …

Optimization of collective communication operations in MPICH

R Thakur, R Rabenseifner… - The International Journal …, 2005 - journals.sagepub.com
We describe our work on improving the performance of collective communication operations
in MPICH for clusters connected by switched networks. For each collective operation, we …

[PDF][PDF] Reining in the outliers in {Map-Reduce} clusters using mantri

G Ananthanarayanan, S Kandula… - … USENIX Symposium on …, 2010 - usenix.org
Experience from an operational Map-Reduce cluster reveals that outliers significantly
prolong job completion. e causes for outliers include run-time contention for processor …

[BOOK][B] Patterns for parallel programming

TG Mattson, B Sanders, B Massingill - 2004 - books.google.com
The Parallel Programming Guide for Every Software Developer From grids and clusters to
next-generation game consoles, parallel computing is going mainstream. Innovations such …

Learning detailed face reconstruction from a single image

E Richardson, M Sela, R Or-El… - Proceedings of the …, 2017 - openaccess.thecvf.com
Reconstructing the detailed geometric structure of a face from a given image is a key to
many computer vision and graphics applications, such as motion capture and reenactment …

Versatile, scalable, and accurate simulation of distributed applications and platforms

H Casanova, A Giersch, A Legrand, M Quinson… - Journal of Parallel and …, 2014 - Elsevier
The study of parallel and distributed applications and platforms, whether in the cluster, grid,
peer-to-peer, volunteer, or cloud computing domain, often mandates empirical evaluation of …

[BOOK][B] Data-intensive text processing with MapReduce

J Lin, C Dyer - 2022 - books.google.com
Our world is being revolutionized by data-driven methods: access to large amounts of data
has generated new insights and opened exciting new opportunities in commerce, science …