Logo
Pattern

Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Common Machine Learning Datasets

15

Flashcards

0/15

Still learning
StarStarStarStar

UCI Machine Learning Repository

StarStarStarStar

A collection of databases, domain theories, and data generators widely used by the machine learning community for empirical analysis of machine learning algorithms.

StarStarStarStar

Yelp Review Dataset

StarStarStarStar

A dataset consisting of user reviews for businesses across 11 metropolitan areas on Yelp, useful for sentiment analysis and recommendation systems.

StarStarStarStar

CIFAR-10

StarStarStarStar

A dataset consisting of 60,000 32x32 color images in 10 different classes, used for computer vision tasks such as object recognition.

StarStarStarStar

LFW (Labeled Faces in the Wild)

StarStarStarStar

A database designed for studying the problem of unconstrained face recognition with more than 13,000 images of faces collected from the web.

StarStarStarStar

LibriSpeech

StarStarStarStar

A dataset of 1,000 hours of English speech derived from audiobooks, allowing for training and evaluating speech recognition systems.

StarStarStarStar

MS COCO

StarStarStarStar

A large-scale dataset for multiple computer vision tasks such as object detection, segmentation, and captioning. Similar to the COCO dataset but often updated with new data and annotations.

StarStarStarStar

MNIST

StarStarStarStar

A dataset of handwritten digits used for image processing and machine learning, ideal for training and testing models on tasks like image classification.

StarStarStarStar

20 Newsgroups

StarStarStarStar

A collection of approximately 20,000 newsgroup documents, partitioned across 20 different newsgroups, suitable for text classification and clustering.

StarStarStarStar

Stanford Dogs Dataset

StarStarStarStar

A dataset with over 20,000 images of 120 breeds of dogs from around the world, which is used for fine-grained image classification.

StarStarStarStar

IMDb

StarStarStarStar

A dataset containing 50,000 movie reviews for natural language processing or sentiment analysis, divided evenly into positive and negative reviews.

StarStarStarStar

Boston Housing

StarStarStarStar

A dataset containing information about different houses in Boston areas, used for regression analysis to predict housing prices.

StarStarStarStar

COCO (Common Objects in Context)

StarStarStarStar

A large-scale dataset for object detection, segmentation, and captioning, containing over 200,000 labeled images across 80 categories.

StarStarStarStar

Sentiment140

StarStarStarStar

A dataset containing 160,000 tweets annotated with sentiments, created for the task of sentiment analysis in the context of social media.

StarStarStarStar

ImageNet

StarStarStarStar

A large visual dataset designed for use in visual object recognition software research, with more than 14 million images and thousands of categories.

StarStarStarStar

Google Open Images

StarStarStarStar

A dataset with millions of annotated images for a broad range of categories, useful for machine learning models requiring large scale visual recognition.

Know
0
Still learning
Click to flip
Know
0
Logo

© Hypatia.Tech. 2024 All rights reserved.