Intel Xeon Phi coprocessor high-performance programming /

Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engin...

Full description

Saved in:
Bibliographic Details
Main Author: Jeffers, Jim (Computer engineer)
Other Authors: Reinders, James
Format: Electronic eBook
Language:English
Published: Waltham, MA : Morgan Kaufmann/Elsevier, ©2013.
Subjects:
Online Access:CONNECT
Table of Contents:
  • Front Cover; Intel® Xeon PhiTM Coprocessor High-Performance Programming; Copyright Page; Contents; Foreword; Preface; Organization; Lots-of-cores.com; Acknowledgements; 1 Introduction; Trend: more parallelism; Why Intel® Xeon PhiTM coprocessors are needed; Platforms with coprocessors; The first Intel® Xeon PhiTM coprocessor; Keeping the "Ninja Gap" under control; Transforming-and-tuning double advantage; When to use an Intel® Xeon PhiTM coprocessor; Maximizing performance on processors first; Why scaling past one hundred threads is so important; Maximizing parallel program performance
  • Measuring readiness for highly parallel executionWhat about GPUs?; Beyond the ease of porting to increased performance; Transformation for performance; Hyper-threading versus multithreading; Coprocessor major usage model: MPI versus offload; Compiler and programming models; Cache optimizations; Examples, then details; For more information; 2 High Performance Closed Track Test Drive!; Looking under the hood: coprocessor specifications; Starting the car: communicating with the coprocessor; Taking it out easy: running our first code; Starting to accelerate: running more than one thread
  • Petal to the metal: hitting full speed using all coresEasing in to the first curve: accessing memory bandwidth; High speed banked curved: maximizing memory bandwidth; Back to the pit: a summary; 3 A Friendly Country Road Race; Preparing for our country road trip: chapter focus; Getting a feel for the road: the 9-point stencil algorithm; At the starting line: the baseline 9-point stencil implementation; Rough road ahead: running the baseline stencil code; Cobblestone street ride: vectors but not yet scaling; Open road all-out race: vectors plus scaling
  • Some grease and wrenches!: a bit of tuningAdjusting the "Alignment"; Using streaming stores; Using huge 2-MB memory pages; Summary; For more information; 4 Driving Around Town: Optimizing A Real-World Code Example; Choosing the direction: the basic diffusion calculation; Turn ahead: accounting for boundary effects; Finding a wide boulevard: scaling the code; Thunder road: ensuring vectorization; Peeling out: peeling code from the inner loop; Trying higher octane fuel: improving speed using data locality and tiling; High speed driver certificate: summary of our high speed tour
  • 5 Lots of Data (Vectors)Why vectorize?; How to vectorize; Five approaches to achieving vectorization; Six step vectorization methodology; Step 1. Measure baseline release build performance; Step 2. Determine hotspots using Intel® VTuneTM Amplifier XE; Step 3. Determine loop candidates using Intel Compiler vec-report; Step 4. Get advice using the Intel Compiler GAP report and toolkit resources; Step 5. Implement GAP advice and other suggestions (such as using elemental functions and/or array notations); Step 6: Repeat!; Streaming through caches: data layout, alignment, prefetching, and so on