spaCy Natural Language Processing
github.com
1
Leaving SiteNav
External Link Disclaimer
You are about to visit github.com. This website is not operated by us. We are not responsible for its content or privacy practices.
About this website
spaCy is a free and open-source software library for advanced natural language processing (NLP) written in Python and Cython. Created by Matthew Honnibal (PhD in Computer Science from UMass Amherst, who previously co-founded Lateral, an AI document retrieval startup) and Ines Montani (a software developer and computational linguist) at Explosion (a Berlin-based AI software company founded by Honnibal and Montani in 2014), spaCy is designed specifically for production use rather than academic research. The first stable version (v1.0) was released in October 2015; the current major version is 3.x (2021). Explosion raised venture funding from investors including SignalFire. Key technical features: implemented in Cython for performance, with the core parser achieving throughput of over 10,000 tokens per second on a single CPU core. Tokenization: fast, non-destructive tokenization that splits text into tokens while preserving original text character offsets. Named Entity Recognition (NER): classifies entities into categories (PERSON, ORG, GPE, DATE, MONEY, PRODUCT, EVENT). Models trained on OntoNotes 5.0 corpus. Dependency parsing: analyzes syntactic structure using a transition-based neural network parser achieving 90%+ UAS (Unlabeled Attachment Score). Pre-trained models for 60+ languages with model sizes (sm, md, lg, trf). Transformer integration: spaCy v3 uses the Thinc neural network library for deep learning, with native transformer support via spacy-transformers (wrapping Hugging Face models like RoBERTa-base and BERT-base). Rule-based matching: Matcher and PhraseMatcher for finding token and phrase patterns. Training: train custom models with config-driven training on annotated data. Prodigy: Explosion's commercial active learning annotation tool. Python/Cython. MIT.
Statistics
1
Views
0
Clicks
0
Like
0
Dislike