Explore tens of thousands of sets crafted by our community.
Common Machine Learning Datasets
15
Flashcards
0/15
UCI Machine Learning Repository
A collection of databases, domain theories, and data generators widely used by the machine learning community for empirical analysis of machine learning algorithms.
Yelp Review Dataset
A dataset consisting of user reviews for businesses across 11 metropolitan areas on Yelp, useful for sentiment analysis and recommendation systems.
CIFAR-10
A dataset consisting of 60,000 32x32 color images in 10 different classes, used for computer vision tasks such as object recognition.
LFW (Labeled Faces in the Wild)
A database designed for studying the problem of unconstrained face recognition with more than 13,000 images of faces collected from the web.
LibriSpeech
A dataset of 1,000 hours of English speech derived from audiobooks, allowing for training and evaluating speech recognition systems.
MS COCO
A large-scale dataset for multiple computer vision tasks such as object detection, segmentation, and captioning. Similar to the COCO dataset but often updated with new data and annotations.
MNIST
A dataset of handwritten digits used for image processing and machine learning, ideal for training and testing models on tasks like image classification.
20 Newsgroups
A collection of approximately 20,000 newsgroup documents, partitioned across 20 different newsgroups, suitable for text classification and clustering.
Stanford Dogs Dataset
A dataset with over 20,000 images of 120 breeds of dogs from around the world, which is used for fine-grained image classification.
IMDb
A dataset containing 50,000 movie reviews for natural language processing or sentiment analysis, divided evenly into positive and negative reviews.
Boston Housing
A dataset containing information about different houses in Boston areas, used for regression analysis to predict housing prices.
COCO (Common Objects in Context)
A large-scale dataset for object detection, segmentation, and captioning, containing over 200,000 labeled images across 80 categories.
Sentiment140
A dataset containing 160,000 tweets annotated with sentiments, created for the task of sentiment analysis in the context of social media.
ImageNet
A large visual dataset designed for use in visual object recognition software research, with more than 14 million images and thousands of categories.
Google Open Images
A dataset with millions of annotated images for a broad range of categories, useful for machine learning models requiring large scale visual recognition.
© Hypatia.Tech. 2024 All rights reserved.