Upstash

Upstash

wikipedia-semantic-search.vercel.app

1

About this website

This project, named Wikipedia Semantic Search, is an experimental application built to demonstrate the scalability and performance of the Upstash Vector database when handling extremely large datasets. The system indexes over 23 million Wikipedia articles across 11 languages—English, German, Spanish, French, Italian, Japanese, Portuguese, Russian, Turkish, Chinese, and Arabic—and stores approximately 144 million vector embeddings in a single Upstash Vector index. Users interact with the tool through a search interface where they can type natural language queries, such as “Longest river in the world,” “Books by Stephen King,” or “Who invented the airplane?” Instead of relying on traditional keyword matching, the search engine converts each query into a dense vector representation using a semantic embedding model, then performs approximate nearest neighbor search against the pre-computed Wikipedia article vectors. This allows the system to understand the meaning behind the query, returning results that are contextually relevant even if the exact words do not appear in the article. For example, a question about “inventor of the airplane” will correctly retrieve the article on the Wright brothers. The interface also offers a “Chat” mode, where users can engage in a conversational back‑and‑forth: the chat function uses the semantic search results as context and generates natural language responses, making it possible to ask follow‑up questions or request clarifications. The underlying architecture is designed to handle massive scale. The 144 million vectors are stored in a single index, yet query latency remains low thanks to Upstash Vector’s serverless, globally distributed design. This project serves as a proof‑of‑concept for organizations that need to perform semantic sea

Tags & Categories

Tags

Statistics

1
Views
0
Clicks
0
Like
0
Dislike

Comments

Log In to post a comment

No comments yet. Be the first!