Discover published sets by community

Explore tens of thousands of sets crafted by our community.

Categories

Computer Science

Natural Language Processing

NLP Libraries and Frameworks

Flashcards

0/20

Still learning

NLTK

The Natural Language Toolkit (NLTK) is a library used for building Python programs that work with human language data. Typical use cases include prototypes for NLP tasks such as tokenization, parsing, and semantic reasoning.

spaCy

spaCy is an open-source software library for advanced Natural Language Processing in Python, designed for production use with pre-built statistical models and word vectors, focusing on tasks like tagging, parsing, and named entity recognition.

TensorFlow

TensorFlow is an end-to-end open-source platform for machine learning that provides comprehensive tools, libraries, and community resources that lets researchers and developers build and deploy ML powered applications, commonly used for creating large-scale neural networks.

PyTorch

PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, focusing on flexibility and speed in the design of deep neural networks.

Gensim

Gensim is a robust open-source vector space modeling and topic modeling toolkit implemented in Python, used mainly for unsupervised semantic modeling from plain text, with scalability in mind for handling large text collections.

Transformers

Transformers is a state-of-the-art library for natural language processing tasks, introduced by Hugging Face, providing a large collection of pre-trained models like BERT and GPT, and highly performant model architectures for NLP tasks.

Scikit-learn

Scikit-learn is a Python library for machine learning that offers various tools for data mining and data analysis. It is not specifically tailored for NLP, but it's widely used for feature extraction, classification, and clustering text data.

AllenNLP

AllenNLP is an open-source NLP research library, built on PyTorch, designed for building state-of-the-art deep learning models on a wide variety of linguistic tasks, such as semantic role labeling and machine translation.

Stanford NLP

The Stanford NLP Group's official Python NLP library provides tools for robustly processing and analyzing human language data and can perform tasks such as part-of-speech tagging and named entity recognition.

TextBlob

TextBlob is a Python (2 and 3) library for processing textual data, providing a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, and sentiment analysis.

Pattern

Pattern is a web mining module for Python, with tools for NLP and machine learning tasks, data retrieval (web scraping), and network analysis. It is used for tasks like part-of-speech tagging, sentiment analysis, and language identification.

FastText

FastText is a library for learning of word embeddings and text classification created by Facebook's AI Research. It is designed to be simple to use and powerful in handling very large datasets efficiently.

OpenNLP

Apache OpenNLP is a machine learning-based toolkit for natural language processing tasks such as tokenization, sentence splitting, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.

BERT

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based machine learning technique for natural language processing pre-training developed by Google. It is used to improve the understanding of the context of words in search queries.

Flair

Flair is a simple NLP library developed by Zalando Research, which provides state-of-the-art NLP models for named entity recognition (NER), part-of-speech tagging (POS), and sense disambiguation.

Tesseract OCR

Tesseract is an optical character recognition (OCR) engine, primarily used for converting images of text into machine-encoded text. While not exclusively an NLP tool, it serves as a bridge for processing physical or scanned documents in NLP pipelines.

elasticsearch

Elasticsearch is a distributed search and analytics engine that is broadly used for log analytics, full-text search, and operational intelligence use cases. It has capabilities for real-time indexing and analyzing textual data.

Keras

Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library and is designed to enable fast experimentation with deep neural networks.

Prophet

Prophet is a forecasting tool tailored for producing high-quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. It's not specific to NLP but can be applied to predict trends from text-based time data.

H2O.ai

H2O.ai provides an open-source machine learning platform that supports both supervised and unsupervised learning. While general-purpose, it can be used in the field of NLP to build models and analyze data, featuring AutoML capability.

Know

Still learning

Click to flip

Know