spaCy
spacy.io
4
Leaving SiteNav
External Link Disclaimer
You are about to visit spacy.io. This website is not operated by us. We are not responsible for its content or privacy practices.
About this website
spaCy is an industrial-strength natural language processing library for Python, created in 2015 by Matthew Honnibal and Ines Montani at Explosion AI. Written in Cython for maximum performance, it processes thousands of documents per second while maintaining a small memory footprint. The library provides a modular pipeline architecture with components including Tokenizer, Tagger, Morphologizer, Lemmatizer, Parser, EntityRecognizer (NER), EntityLinker, SpanCategorizer, TextCategorizer, and SentenceRecognizer. spaCy ships pre-trained models like en_core_web_sm (12MB), en_core_web_md (40MB), en_core_web_lg (560MB), and en_core_web_trf (438MB transformer model based on RoBERTa) for English, plus models for over 70 languages including German (de_core_news), French (fr_core_news), Chinese (zh_core_web), Japanese (ja_core_news), and Dutch (nl_core_news). NER labels follow the OntoNotes 5 scheme with types like PERSON, ORG, GPE, DATE, MONEY, PRODUCT, EVENT, and LAW. The Matcher and PhraseMatcher APIs support token-based rule matching with operators and quantifiers, while EntityRuler combines statistical and rule-based NER. The v3.x config system (config.cfg) defines all training parameters using a structured approach inspired by Thinc. spaCy integrates with Hugging Face transformers via spacy-transformers, supports custom pipeline components via Language.add_pipe(), and includes displaCy for dependency and entity visualization. The project has over 30,000 GitHub stars and is used by companies like Airbnb, Quora, and Mashable for production NLP workloads.
Statistics
4
Views
0
Clicks
0
Like
0
Dislike