Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Parallel Computing

Amdahl's Law and Gustafson's Law

Benchmarking Parallel Code

Basic Parallel Computing Terminology

CUDA Basics for GPU Parallel Programming

Data Locality in Parallel Algorithms

Deadlocks and Livelocks

Fault Tolerance in Parallel Systems

GPU Programming Basics

Introduction to High-Performance Computing (HPC)

Locks, Mutexes, and Semaphores

Message Passing vs Shared Memory

Parallel Computing Patterns

Parallel File Systems

Parallel Algorithms for Data Structures

Parallel Computing Algorithms

Parallel Computing in Cloud Environments

Parallel Sorting Algorithms

Parallel Computer Architectures

OpenMP Directives and Clauses

Parallel Programming Libraries

Parallel Computing with Python and multiprocessing

Types of Parallelism in Computing

Speedup and Efficiency in Parallel Systems

Shared Memory Programming with Pthreads

Task and Data Parallelism

Fault Tolerance in Parallel Systems

Flashcards

0/15

Still learning

Majority Voting

Explanation: Majority voting involves using multiple redundant components and choosing the most common output as the correct result. Example: Triple Modular Redundancy (TMR) where three identical components perform the same operation and the majority output is taken.

Redundancy

Explanation: Redundancy involves duplicating critical components or functions of a system with the intention of increasing reliability. Example: In a server cluster, having multiple servers running the same applications so that if one fails, others can take over.

Error Correction Codes

Explanation: Error correction codes (ECC) are used to detect and correct errors within data. Example: ECC memory that can detect and correct single-bit or multi-bit errors.

Checkpointing

Explanation: Checkpointing is the process of saving the state of a system periodically so that it can restart from the last saved state in case of failure. Example: In distributed computing, storing the state of a computation every N minutes.

Replication

Explanation: Replication involves creating copies of data or services to ensure that failure of a single component does not result in data loss. Example: Database replication to multiple nodes to prevent total data loss during a node failure.

N-Modular Redundancy (NMR)

Explanation: N-Modular Redundancy involves N copies of a component running in parallel, with a voting mechanism to determine the correct output. Example: Quintuple Modular Redundancy (QMR) with five components where the majority vote decides the result.

Hot Swapping

Explanation: Hot swapping allows replacement or addition of components to a system without shutting it down. Example: Replacing a failed hard drive in a RAID configuration without turning off the server.

Heartbeat Mechanism

Explanation: A heartbeat mechanism is a periodic signal sent between components to verify operation and connectivity. Example: Two servers sending 'I'm alive' messages to each other to confirm they are still operational.

Graceful Degradation

Explanation: Graceful degradation allows a system to continue operating at a reduced level of functionality when parts of the system fail. Example: A web service that disables certain non-critical features when it's under heavy load or partial failure.

Rollback Recovery

Explanation: Rollback recovery involves reverting a system to a previously known good state following an error. Example: Using transaction logs in databases to restore to the state before a transaction that caused a crash.

Failover

Explanation: Failover is the process of transferring services and operations to a standby system when the primary system fails. Example: Automatic switching to a backup server when the main server crashes.

Task Rescheduling

Explanation: Task rescheduling involves dynamically reassigning tasks to available resources when some fail. Example: In a grid computing environment, reassigning tasks from an unresponsive node to a functional one.

Self-healing Systems

Explanation: Self-healing systems are capable of detecting and fixing problems automatically. Example: A distributed system that automatically redistributes tasks if a node fails.

Software Redundancy

Explanation: Software redundancy includes implementing additional software services that can take over functionality if the primary service fails. Example: Multiple DNS servers that provide the same naming service to ensure uninterrupted hostname resolution.

Rejuvenation

Explanation: Rejuvenation entails periodically restarting components to clear any faults that may have accumulated over time. Example: Rebooting servers during low-traffic periods to prevent memory leaks from causing problems.

Know

Still learning

Click to flip

Know