- The course provides detailed discussion on architectural mechanisms, which exploit parallelism available in programs at various granularities, and programming tools to work with such architectures.
- Introduction: Defining computer architecture, Flynn’s classification of computers, Metrics for performance measurement.
- Memory Hierarchy: Introduction, Advanced optimizations of cache performance, Memory technology and optimizations, Virtual memory and virtual machines, The design of memory hierarchy, Introduction to Pin instrumentation and Cachegrind, Case study — Memory hierarchies in Intel Core i7 and ARM Cortex-A8.
- Instruction Level Parallelism: Introduction, Compiler techniques for exposing ILP, Reducing branch costs and advanced branch prediction, Dynamic scheduling, Advanced techniques for instruction delivery and speculation, Limitations of ILP, Multithreading, Modeling branch predictors using Pin tool, Case study — Dynamic scheduling in Intel Core i7 and ARM Cortex-A8.
- Data-Level Parallelism: Vector architecture, SIMD instruction set extensions for multimedia, Graphics processing units, Detecting and enhancing loop-level parallelism, Case study — Tesla vs Core i7 processors.
- Thread Level Parallelism: Introduction, Shared memory multicore systems, Performance metrics for shared-memory multicore systems, cache coherence protocols, Synchronization, Memory consistency, Multithreaded programming using OpenMP, Case study — Intel Skylake and IBM Power8.
- Data Level Parallelism: Introduction, Vector architecture, SIMD instruction set extensions for multimedia, Graphics processing units, GPU memory hierarchy, Detecting and enhancing loop-level parallelism, CUDA programming, Case study — Nvidia Maxwell.
Students are expected to have done a course on Computer Organization, and Operating Systems. A basic understanding of pipelining, caching, and OS principles is assumed.
- J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitative Approach. 5th Edition, Morgan Kauffmann Publishers, 2012.
- J.P. Shen and M.H. Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors. McGraw-Hill Publishers, 2005.
- D.B. Kirk and W.W. Hwu. Programming Massively Parallel Processors. 2nd Edition, Morgan Kauffmann Publishers, 2012.
- Pin – A Dynamic Binary Instrumentation Tool.
- Cachegrind: A Cache and Branch-Prediction Profiler.
- Programming Assignments (35%)
- Mid Semester Exam (20%)
- End Semester Exam (40%)
- Class participation (5%)
Lecture Schedule: J-slot
- Lectures on Mondays 1650hrs – 1800hrs, and Wednesdays 1400hrs – 1515hrs
- Tutorials on Thursdays 1525hrs – 1640hrs on a need basis
Last modified: Apr 19th, 2016