Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Parallel Computing

Amdahl's Law and Gustafson's Law

Basic Parallel Computing Terminology

Benchmarking Parallel Code

CUDA Basics for GPU Parallel Programming

Deadlocks and Livelocks

Data Locality in Parallel Algorithms

Fault Tolerance in Parallel Systems

GPU Programming Basics

Introduction to High-Performance Computing (HPC)

Locks, Mutexes, and Semaphores

Message Passing vs Shared Memory

Parallel File Systems

Parallel Computing in Cloud Environments

Parallel Computing Algorithms

Parallel Computing Patterns

Parallel Algorithms for Data Structures

Parallel Sorting Algorithms

Parallel Computer Architectures

OpenMP Directives and Clauses

Parallel Computing with Python and multiprocessing

Parallel Programming Libraries

Types of Parallelism in Computing

Shared Memory Programming with Pthreads

Speedup and Efficiency in Parallel Systems

Task and Data Parallelism

Introduction to High-Performance Computing (HPC)

Flashcards

0/20

Still learning

Scalability

Scalability in HPC refers to a system's ability to handle a growing amount of work by adding resources to the system. It is crucial for HPC systems to efficiently utilize more resources as problem sizes and demands increase.

Data Intensive Computing

Data Intensive Computing focuses on the ability to handle, process, and manage large data sets efficiently which is a critical component of high-performance computing for data-driven applications.

Job Scheduling in HPC

Job scheduling in HPC involves managing and dispatching compute jobs across the available resources. It's a critical component for ensuring optimal usage of HPC resources and maintaining system efficiency.

Cluster Computing

Cluster computing refers to the use of multiple computers, typically commodity hardware, working together closely as a single system. These clusters are commonly used for HPC tasks.

Checkpoint/Restart in HPC

Checkpoint/Restart is a technique used in HPC to save the current state of a running job so it can be resumed later. This is critical for long-running computations in case of system failures.

Energy Efficiency in HPC

Energy efficiency in HPC involves designing and operating HPC systems to consume the least amount of power for a given level of performance. It's important due to the high power requirements and environmental impacts of HPC systems.

Exascale Computing

Exascale computing refers to computing systems capable of at least one exaFLOP, or a billion billion ( $10^{18}$ ) calculations per second. Reaching exascale is a major goal for the HPC community, as it will allow for the solving of complex and previously intractable problems.

Message Passing Interface (MPI)

MPI is a standardized and portable message-passing system designed to function on a wide variety of parallel computing architectures. Widely used in HPC, it enables complex communications between distributed memory systems.

OpenMP

OpenMP is an open standard for parallel programming in C, C++, and Fortran, which enables the development of parallel applications that are scalable across multiple platforms. It is widely used in the HPC community, especially for shared memory architectures.

What is HPC?

High-Performance Computing (HPC) refers to the practice of aggregating computing power in a way that delivers much greater performance than one could get out of a typical desktop computer or workstation, in order to solve large problems in science, engineering, or business.

Supercomputers

Supercomputers are high-performance computing systems that are at the forefront of processing capacity, especially in terms of speed of calculation. They are essential for intensive tasks like climate simulations, nuclear simulations, and large-scale computations.

Cloud Computing and HPC

Cloud Computing can provide HPC resources with a scalable, virtualized infrastructure that users can access on-demand. This is particularly significant for organizations that need occasional access to high-performance computing resources without the capital investment in hardware.

Distributed Computing

Distributed computing involves a collection of separate, possibly heterogeneous systems that work together to solve a computational problem. It's important for HPC as it enables the collaborative use of geographically dispersed resources.

Parallel Computing

Parallel computing is a type of computation where many calculations or processes are carried out simultaneously. It's an essential aspect of HPC as it leverages multiple computing resources to solve computational problems efficiently.

HPC Storage Systems

Storage systems in HPC are designed to handle massive amounts of data and provide quick data access. They often include parallel file systems to support high-speed data access for computing clusters.

Grid Computing

Grid Computing is a distributed system that combines computer resources from multiple administrative domains to reach a common goal. It's important in HPC for tackling large-scale tasks that cannot be handled by a single institution or domain.

CPU vs. GPU in HPC

In HPC, CPUs (Central Processing Units) are used for tasks requiring complex decision-making and flexibility, while GPUs (Graphics Processing Units) are specialized for highly parallel processing tasks. GPUs often provide significant speedups in HPC applications due to their many cores.

HPC Interconnects

Interconnects in HPC are the high-speed communication networks that link together various components of an HPC system. They allow for rapid data transfer rates between nodes which is essential for performance in parallel processing.

Performance Tuning in HPC

Performance tuning in HPC is the process of optimizing the configuration of HPC systems and applications for the purpose of achieving the best possible performance. This involves an understanding of both hardware and software aspects of the system.

Fault Tolerance in HPC

Fault tolerance in HPC refers to the ability of a system to continue operating properly in the event of the failure of some of its components. This is key for HPC systems as they often run critical applications where downtime is very costly.

Know

Still learning

Click to flip

Know