Logo
Pattern

Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Big Data Technologies in Databases

10

Flashcards

0/10

Still learning
StarStarStarStar

Hadoop

StarStarStarStar

Purpose: Distributed processing of large data sets across clusters of computers. Key Characteristics: Scalable, fault-tolerant and open-source framework.

StarStarStarStar

HBase

StarStarStarStar

Purpose: A distributed, versioned, column-oriented store modeled after Google's Bigtable, providing big table-like capabilities for Hadoop. Key Characteristics: Linear and modular scalability.

StarStarStarStar

MongoDB

StarStarStarStar

Purpose: A NoSQL database designed for ease of development and scalability. Key Characteristics: Document-oriented, high performance, and high availability.

StarStarStarStar

Apache Kafka

StarStarStarStar

Purpose: A distributed streaming platform that is used to build real-time data pipelines and streaming apps. Key Characteristics: Durability, high-throughput, scalability.

StarStarStarStar

Apache Flink

StarStarStarStar

Purpose: A stream processing framework that can also handle batch processing. Key Characteristics: True streaming model, fault tolerance, high throughput.

StarStarStarStar

Apache Hive

StarStarStarStar

Purpose: Data warehouse software to facilitate easy data summarization, querying, and analysis of large datasets stored in Hadoop files. Key Characteristics: SQL-like language called HiveQL, plug-in to MapReduce framework.

StarStarStarStar

Apache Spark

StarStarStarStar

Purpose: General-purpose distributed data processing engine that extends the MapReduce model to efficiently process big data. Key Characteristics: Speed, ease of use, and sophisticated analytics.

StarStarStarStar

Presto

StarStarStarStar

Purpose: A high performance, distributed SQL query engine for big data. Key Characteristics: Federated queries, in-memory processing, interactive analysis.

StarStarStarStar

Apache Pig

StarStarStarStar

Purpose: A platform for analyzing large datasets that consists of a high-level language for expressing data analysis programs. Key Characteristics: Abstraction over MapReduce, extensibility through user-defined functions, optimization opportunities.

StarStarStarStar

Cassandra

StarStarStarStar

Purpose: A distributed NoSQL database designed to handle large amounts of data across many servers. Key Characteristics: Scalability, high availability, and fault tolerance.

Know
0
Still learning
Click to flip
Know
0
Logo

© Hypatia.Tech. 2024 All rights reserved.