CS403
PARALLEL ARCHITECTURES AND PROGRAMMING
Objectives
- To understand the fundamental principles and engineering trade-offs involved in designing modern parallel computers
- To develop programming skills to effectively implement parallel architecture
Outcomes
- Ability to design parallel programs to enhance machine performance in parallel hardware environment
- Ability to design and implement parallel programs in modern environments such as CUDA, OpenMP, etc.
Unit – I
Introduction: The need for parallelism, Forms of parallelism (SISD, SIMD, MISD, MIMD), Moore's Law and Multi-cores, Fundamentals of Parallel Computers, Communication architecture, Message passing architecture, Data parallel architecture, Dataflow architecture, Systolic architecture, Performance Issues.
Unit – II
Large Cache Design: Shared vs. Private Caches, Centralized vs. Distributed Shared Caches, Snooping-based cache coherence protocol, directory-based cache coherence protocol, Uniform Cache Access, Non-Uniform Cache Access, D-NUCA, S-NUCA, Inclusion, Exclusion, Difference between transaction and transactional memory, STM, HTM.
Unit – III
Graphics Processing Unit: GPUs as Parallel Computers, Architecture of a modern GPU, Evolution of Graphics Pipelines, GPGPUs, Scalable GPUs, Architectural characteristics of Future Systems, Implication of Technology and Architecture for users, Vector addition, Applications of GPU.
Unit – IV
Introduction to Parallel Programming: Strategies, Mechanism, Performance theory, Parallel Programming Patterns: Nesting pattern, Parallel Control Pattern, Parallel Data Management, Map: Scaled Vector, Mandelbrot, Collative: Reduce, Fusing Map and Reduce, Scan, Fusing Map and Scan, Data Recognition: Gather, Scatter, Pack , Stencil and Recurrence, Fork-Join, Pipeline
Unit – V
Parallel Programming Languages: Distributed Memory Programming with MPI: trapezoidal rule in MPI, I/O handling, MPI derived datatype, Collective Communication, Shared Memory Programming with Pthreads: Conditional Variables, read-write locks, Cache handling, Shared memory programming with Open MP: Parallel for directives, scheduling loops, Thread Safety, CUDA: Parallel programming in CUDA C, Thread management, Constant memory and Event, Graphics Interoperability, Atomics, Streams.
TEXT BOOKS/REFERENCE BOOKS
- D. E. Culler, J. P. Singh, and A. Gupta, “Parallel Computer Architecture”, MorganKaufmann, 2004
- Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar, “Multi-Core Cache Hierarchies”, Morgan & Claypool Publishers, 2011
- Peter and Pach Eco, “An Introduction to Parallel Programming”, Elsevier, 2011
- James R. Larus and Ravi Rajwar, “Transactional Memory”, Morgan & Claypool Publishers, 2007
- David B. Kirk, Wen-mei W. Hwu, “Programming Massively Parallel Processors: A Hands-on Approach”, 2010
- Barbara Chapman, F. Desprez, Gerhard R. Joubert, Alain Lichnewsky, Frans Peters “Parallel Computing: From Multicores and GPU's to Petascale”, 2010
- Michael McCool, James Reinders, Arch Robison, “Structured Parallel Programming: Patterns for Efficient Computation”, 2012
- Jason Sanders, Edward Kandrot, “CUDA by Example: An Introduction to GeneralPurpose GPU Programming”, 2011