Logo
Pattern

Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Fault Tolerance in Parallel Systems

15

Flashcards

0/15

Still learning
StarStarStarStar

Majority Voting

StarStarStarStar

Explanation: Majority voting involves using multiple redundant components and choosing the most common output as the correct result. Example: Triple Modular Redundancy (TMR) where three identical components perform the same operation and the majority output is taken.

StarStarStarStar

Redundancy

StarStarStarStar

Explanation: Redundancy involves duplicating critical components or functions of a system with the intention of increasing reliability. Example: In a server cluster, having multiple servers running the same applications so that if one fails, others can take over.

StarStarStarStar

Error Correction Codes

StarStarStarStar

Explanation: Error correction codes (ECC) are used to detect and correct errors within data. Example: ECC memory that can detect and correct single-bit or multi-bit errors.

StarStarStarStar

Checkpointing

StarStarStarStar

Explanation: Checkpointing is the process of saving the state of a system periodically so that it can restart from the last saved state in case of failure. Example: In distributed computing, storing the state of a computation every N minutes.

StarStarStarStar

Replication

StarStarStarStar

Explanation: Replication involves creating copies of data or services to ensure that failure of a single component does not result in data loss. Example: Database replication to multiple nodes to prevent total data loss during a node failure.

StarStarStarStar

N-Modular Redundancy (NMR)

StarStarStarStar

Explanation: N-Modular Redundancy involves N copies of a component running in parallel, with a voting mechanism to determine the correct output. Example: Quintuple Modular Redundancy (QMR) with five components where the majority vote decides the result.

StarStarStarStar

Hot Swapping

StarStarStarStar

Explanation: Hot swapping allows replacement or addition of components to a system without shutting it down. Example: Replacing a failed hard drive in a RAID configuration without turning off the server.

StarStarStarStar

Heartbeat Mechanism

StarStarStarStar

Explanation: A heartbeat mechanism is a periodic signal sent between components to verify operation and connectivity. Example: Two servers sending 'I'm alive' messages to each other to confirm they are still operational.

StarStarStarStar

Graceful Degradation

StarStarStarStar

Explanation: Graceful degradation allows a system to continue operating at a reduced level of functionality when parts of the system fail. Example: A web service that disables certain non-critical features when it's under heavy load or partial failure.

StarStarStarStar

Rollback Recovery

StarStarStarStar

Explanation: Rollback recovery involves reverting a system to a previously known good state following an error. Example: Using transaction logs in databases to restore to the state before a transaction that caused a crash.

StarStarStarStar

Failover

StarStarStarStar

Explanation: Failover is the process of transferring services and operations to a standby system when the primary system fails. Example: Automatic switching to a backup server when the main server crashes.

StarStarStarStar

Task Rescheduling

StarStarStarStar

Explanation: Task rescheduling involves dynamically reassigning tasks to available resources when some fail. Example: In a grid computing environment, reassigning tasks from an unresponsive node to a functional one.

StarStarStarStar

Self-healing Systems

StarStarStarStar

Explanation: Self-healing systems are capable of detecting and fixing problems automatically. Example: A distributed system that automatically redistributes tasks if a node fails.

StarStarStarStar

Software Redundancy

StarStarStarStar

Explanation: Software redundancy includes implementing additional software services that can take over functionality if the primary service fails. Example: Multiple DNS servers that provide the same naming service to ensure uninterrupted hostname resolution.

StarStarStarStar

Rejuvenation

StarStarStarStar

Explanation: Rejuvenation entails periodically restarting components to clear any faults that may have accumulated over time. Example: Rebooting servers during low-traffic periods to prevent memory leaks from causing problems.

Know
0
Still learning
Click to flip
Know
0
Logo

© Hypatia.Tech. 2024 All rights reserved.