Explore tens of thousands of sets crafted by our community.
GPU Programming Basics
25
Flashcards
0/25
Stream Processor
A stream processor is a type of computing unit found inside a GPU that is optimized for processing multiple data elements simultaneously (data parallelism).
Thread Block
In GPU programming, a thread block is a group of threads that can be executed independently and can share data through a shared memory space provided by the GPU.
Global Memory
Global memory refers to the main memory on a GPU which is accessible by all threads and has the highest capacity but lower speed compared to other types of GPU memory.
Texture Memory
Texture memory is a read-only cache on the GPU optimized for texture mapping operations, which can also be used to efficiently handle regular memory access patterns in general-purpose computing.
Data Parallelism
Data parallelism is a form of parallelization where each processor performs the same task on different pieces of distributed data in parallel.
Synchronization
Synchronization in GPU programming is used to coordinate the execution of threads, ensuring that certain operations are performed in the correct order or that resources are not accessed concurrently by multiple threads in a way that causes conflicts.
Shader
A shader is a type of computer program that was originally used for shading (the production of appropriate levels of light, darkness, and color within an image) but which now performs a variety of specialized functions in various fields of computer graphics special effects.
Compute Kernel
A compute kernel is a function written to run on the parallel architecture of a GPU and typically executed across many parallel threads.
Memory Hierarchy
In GPU architecture, the memory hierarchy consists of registers, shared memory, constant memory, local memory, and global memory, arranged in a hierarchy based on speed and size.
GPGPU
General-Purpose computing on Graphics Processing Units (GPGPU) is the use of a GPU to handle computation typically handled by the Central Processing Unit (CPU).
OpenCL
Open Computing Language (OpenCL) is an open standard for cross-platform, parallel programming of diverse processors found in personal computers, servers, mobile devices, and embedded platforms.
Multiprocessor
In the context of GPU architecture, a multiprocessor is a processing unit capable of executing multiple threads in parallel, and multiple such units make up a GPU.
GPU
A Graphics Processing Unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display.
CUDA
Compute Unified Device Architecture (CUDA) is a parallel computing platform and application programming interface (API) model created by Nvidia that allows software developers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing.
Occupancy
Occupancy refers to the ratio of the number of active warps per multiprocessor to the maximum number of possible active warps, and is a measure of how well the GPU resources are being utilized.
Shared Memory
Shared memory in GPU architecture is a small amount of memory accessible by threads within the same thread block that allows for fast, cooperative data sharing.
Task Parallelism
Task parallelism is a form of parallelization where different processors perform different tasks on the same or different sets of data.
Graphics Pipeline
The graphics pipeline is a conceptual model in computer graphics that describes the sequence of steps used to render a 3D scene onto a 2D screen.
API
An Application Programming Interface (API) is a set of protocols, routines, and tools for building software and applications which specify how software components should interact.
Warp
A warp is the basic unit of parallel execution in Nvidia's GPUs, consisting of 32 threads that are processed simultaneously by the GPU.
SIMD
Single Instruction, Multiple Data (SIMD) is a parallel computing architecture where multiple processing elements perform the same operation on multiple data points simultaneously.
Thread Divergence
Thread divergence occurs when threads of the same warp follow different execution paths due to conditional branching, which can lead to serialization of the divergent branches and reduced parallel efficiency.
Kernel Grid
A kernel grid is a multidimensional grid of thread blocks that defines the number of thread blocks and the number of threads within each block for a kernel launch in GPU programming.
Constant Memory
Constant memory on a GPU is a small, fast cache that is read-only and ideal for variables that do not change over the course of a kernel execution.
Register
In GPU architecture, a register is a small storage location that is very fast and located on the processor itself, used to hold variables for individual threads.
© Hypatia.Tech. 2024 All rights reserved.