Explore tens of thousands of sets crafted by our community.
Big Data Technologies in Databases
10
Flashcards
0/10
Hadoop
Purpose: Distributed processing of large data sets across clusters of computers. Key Characteristics: Scalable, fault-tolerant and open-source framework.
HBase
Purpose: A distributed, versioned, column-oriented store modeled after Google's Bigtable, providing big table-like capabilities for Hadoop. Key Characteristics: Linear and modular scalability.
MongoDB
Purpose: A NoSQL database designed for ease of development and scalability. Key Characteristics: Document-oriented, high performance, and high availability.
Apache Kafka
Purpose: A distributed streaming platform that is used to build real-time data pipelines and streaming apps. Key Characteristics: Durability, high-throughput, scalability.
Apache Flink
Purpose: A stream processing framework that can also handle batch processing. Key Characteristics: True streaming model, fault tolerance, high throughput.
Apache Hive
Purpose: Data warehouse software to facilitate easy data summarization, querying, and analysis of large datasets stored in Hadoop files. Key Characteristics: SQL-like language called HiveQL, plug-in to MapReduce framework.
Apache Spark
Purpose: General-purpose distributed data processing engine that extends the MapReduce model to efficiently process big data. Key Characteristics: Speed, ease of use, and sophisticated analytics.
Presto
Purpose: A high performance, distributed SQL query engine for big data. Key Characteristics: Federated queries, in-memory processing, interactive analysis.
Apache Pig
Purpose: A platform for analyzing large datasets that consists of a high-level language for expressing data analysis programs. Key Characteristics: Abstraction over MapReduce, extensibility through user-defined functions, optimization opportunities.
Cassandra
Purpose: A distributed NoSQL database designed to handle large amounts of data across many servers. Key Characteristics: Scalability, high availability, and fault tolerance.
© Hypatia.Tech. 2024 All rights reserved.