Explore tens of thousands of sets crafted by our community.
Benchmarking Parallel Code
20
Flashcards
0/20
Scalability
Scalability refers to a system's ability to maintain or improve performance as more computing resources are added. It is essential for evaluating the long-term utility of parallel computing architectures.
Communication Overhead
Communication overhead refers to the time spent on data exchange between processors rather than on actual computation. This often becomes the bottleneck in improving parallel code performance.
MPI (Message Passing Interface)
MPI is a standardized and portable message-passing system designed to function on a wide variety of parallel computing architectures. It is commonly used in benchmarking parallel code for communications between nodes.
Benchmark Suites
Benchmark suites like SPECmpi and NAS Parallel Benchmarks are collections of programs used to evaluate the performance and scalability of parallel computers. They provide standardized tests for various types of computations.
Amdahl's Law
Amdahl's Law predicts the potential speedup of a program's execution as more processors are added, considering the proportion of code that is serial and parallelizable. As a formula, , with being the parallelizable portion.
HPL (High Performance Linpack)
The HPL benchmark measures the floating-point computing power of a system by solving a dense system of linear equations. It is commonly used to rank supercomputers in the TOP500 list.
FLOPS (Floating Point Operations Per Second)
FLOPS is a metric that quantifies computing performance for numerical calculations. It's an essential performance metric in high-performance computing and helps compare different computer architectures.
Race Condition
A race condition occurs when multiple threads or processes access shared data and try to change it at the same time. It can lead to unpredictable results and is crucial to consider during parallel code benchmarking.
Lock Contention
Lock contention happens when multiple threads or processes wait for a lock on a resource leading to serialization of parallel code. Minimizing lock contention is essential for improving parallel code performance.
Parallel Efficiency
Parallel efficiency is a measure that combines speedup and efficiency to demonstrate how well a parallelization scales with the increase of resources. It essentially indicates how much additional benefit is gained from adding more processors.
Thread Safety
Thread safety refers to the property of an algorithm or data structure where it functions correctly during concurrent execution by multiple threads. This is vital for accurate benchmarking of parallel applications.
Load Balancing
Load balancing is the practice of distributing work evenly across processors to avoid any idling and ensure maximum utilization. It is a critical aspect of optimizing parallel code performance.
Gustafson's Law
Gustafson's Law provides a more optimistic view compared to Amdahl's Law, suggesting that scaled speedup is linear if the workload increases with the number of processors. The formula is .
Non-uniform Memory Access (NUMA)
NUMA is a computer memory design where the time taken to access memory depends on the memory's location relative to a processor. In parallel computing, optimally managing NUMA is key for maintaining high performance.
Efficiency
Efficiency evaluates how effectively parallel computing resources are used, giving insight into the overhead caused by parallelization. It is the speedup divided by the number of processors, expressed as .
Granularity
Granularity indicates the size of tasks into which a parallel computation is divided. Fine granularity means smaller tasks with more frequent communication, while coarse granularity means larger tasks with less frequent communication.
Weak Scaling
Weak scaling measures how the solution time changes with the number of processors while increasing the problem size proportionally. The ideal weak scaling keeps the time constant as processors and problem size grow.
Strong Scaling
Strong scaling measures the reduction in solution time with the addition of processors without changing the total problem size. Ideal strong scaling would linearly reduce the time with each additional processor.
Speedup
Speedup measures the performance improvement of a parallel algorithm compared to a serial algorithm. It's calculated as , where is the run time of the serial algorithm and is the run time using processors.
Parallel Overhead
Parallel overhead is the additional computation associated with parallelization such as synchronization, communication, and non-uniform memory access (NUMA) effects, which can affect the net speedup achieved.
© Hypatia.Tech. 2024 All rights reserved.