Explore tens of thousands of sets crafted by our community.
NLP Libraries and Frameworks
20
Flashcards
0/20
FastText
FastText is a library for learning of word embeddings and text classification created by Facebook's AI Research. It is designed to be simple to use and powerful in handling very large datasets efficiently.
Flair
Flair is a simple NLP library developed by Zalando Research, which provides state-of-the-art NLP models for named entity recognition (NER), part-of-speech tagging (POS), and sense disambiguation.
PyTorch
PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, focusing on flexibility and speed in the design of deep neural networks.
Transformers
Transformers is a state-of-the-art library for natural language processing tasks, introduced by Hugging Face, providing a large collection of pre-trained models like BERT and GPT, and highly performant model architectures for NLP tasks.
Stanford NLP
The Stanford NLP Group's official Python NLP library provides tools for robustly processing and analyzing human language data and can perform tasks such as part-of-speech tagging and named entity recognition.
Prophet
Prophet is a forecasting tool tailored for producing high-quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. It's not specific to NLP but can be applied to predict trends from text-based time data.
Scikit-learn
Scikit-learn is a Python library for machine learning that offers various tools for data mining and data analysis. It is not specifically tailored for NLP, but it's widely used for feature extraction, classification, and clustering text data.
AllenNLP
AllenNLP is an open-source NLP research library, built on PyTorch, designed for building state-of-the-art deep learning models on a wide variety of linguistic tasks, such as semantic role labeling and machine translation.
Tesseract OCR
Tesseract is an optical character recognition (OCR) engine, primarily used for converting images of text into machine-encoded text. While not exclusively an NLP tool, it serves as a bridge for processing physical or scanned documents in NLP pipelines.
TensorFlow
TensorFlow is an end-to-end open-source platform for machine learning that provides comprehensive tools, libraries, and community resources that lets researchers and developers build and deploy ML powered applications, commonly used for creating large-scale neural networks.
OpenNLP
Apache OpenNLP is a machine learning-based toolkit for natural language processing tasks such as tokenization, sentence splitting, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
H2O.ai
H2O.ai provides an open-source machine learning platform that supports both supervised and unsupervised learning. While general-purpose, it can be used in the field of NLP to build models and analyze data, featuring AutoML capability.
Keras
Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library and is designed to enable fast experimentation with deep neural networks.
NLTK
The Natural Language Toolkit (NLTK) is a library used for building Python programs that work with human language data. Typical use cases include prototypes for NLP tasks such as tokenization, parsing, and semantic reasoning.
spaCy
spaCy is an open-source software library for advanced Natural Language Processing in Python, designed for production use with pre-built statistical models and word vectors, focusing on tasks like tagging, parsing, and named entity recognition.
Pattern
Pattern is a web mining module for Python, with tools for NLP and machine learning tasks, data retrieval (web scraping), and network analysis. It is used for tasks like part-of-speech tagging, sentiment analysis, and language identification.
elasticsearch
Elasticsearch is a distributed search and analytics engine that is broadly used for log analytics, full-text search, and operational intelligence use cases. It has capabilities for real-time indexing and analyzing textual data.
Gensim
Gensim is a robust open-source vector space modeling and topic modeling toolkit implemented in Python, used mainly for unsupervised semantic modeling from plain text, with scalability in mind for handling large text collections.
TextBlob
TextBlob is a Python (2 and 3) library for processing textual data, providing a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, and sentiment analysis.
BERT
BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based machine learning technique for natural language processing pre-training developed by Google. It is used to improve the understanding of the context of words in search queries.
© Hypatia.Tech. 2024 All rights reserved.