Post
Voyage AI Models by MongoDB: A Complete 101 Guide
A Complete 101 Guide to Every Model
Capabilities, Target Use Cases & Design Intent for Each Model
Introduction
Voyage AI, acquired by MongoDB in February 2025, provides state-of-the-art embedding and reranking models purpose-built for AI search and retrieval. These models are available through MongoDB Atlas via the Embedding and Reranking API, integrated with MongoDB Community through Automated Embedding, and available as a standalone platform. This guide provides a 101-level overview of every model's capabilities, target use cases, and design intent.
The models fall into five categories:
- General-Purpose Text Embeddings (Voyage 4 series)
- Domain-Specific Text Embeddings
- Contextualized Chunk Embeddings
- Multimodal Embeddings
- Rerankers
Each category addresses a different dimension of the retrieval problem.
General-Purpose Text Embeddings — The Voyage 4 Series
The Voyage 4 series is the latest generation of general-purpose text embedding models. Their defining innovation is an industry-first shared embedding space: all four Voyage 4 models produce compatible embeddings that can be used interchangeably. This means you can embed documents with voyage-4-large for maximum accuracy, run queries with voyage-4-lite for speed, and develop locally with voyage-4-nano—without re-indexing. All models are inherently multilingual, support Matryoshka dimensionality (256, 512, 1024, 2048), and offer multiple quantization options (float32, int8, uint8, binary, ubinary) to reduce vector database costs.
voyage-4-large
Flagship — Highest Retrieval Accuracy
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | 1024 (default), 256, 512, 2048 |
| Pricing | $0.12 per 1M tokens |
| Free Tier | 200 million tokens |
Overview: voyage-4-large is the flagship model in the Voyage 4 series and uses a Mixture-of-Experts (MoE) architecture to achieve state-of-the-art retrieval accuracy while keeping serving costs approximately 40% lower than comparable dense models. It ranks first across evaluated domains spanning 100+ datasets including law, finance, code, medical, web, technical documentation, long documents, and conversations. It outperforms competing models such as Gemini and Cohere on the public RTEB leaderboard.
Target Use Cases:
Production systems where retrieval accuracy is the top priority: enterprise RAG pipelines, knowledge management systems, legal and financial document retrieval, agentic AI systems that depend on precise context retrieval. High-fidelity document indexing in asymmetric retrieval setups where queries are served by a lighter model.
Key Capabilities:
Mixture-of-Experts architecture for accuracy at reduced cost. Shared embedding space with all Voyage 4 models. Flexible dimensions via Matryoshka learning and quantization-aware training. Supports query/document input type optimization.
Best For: Mission-critical retrieval where accuracy cannot be compromised, and for document-side indexing in asymmetric retrieval pipelines.
voyage-4
Balanced — Quality Meets Efficiency
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | 1024 (default), 256, 512, 2048 |
| Pricing | $0.06 per 1M tokens |
| Free Tier | 200 million tokens |
Overview: voyage-4 is the general-purpose workhorse of the Voyage 4 series, striking a balance between retrieval accuracy, cost, and latency. It outperforms all major competitors in retrieval quality while being offered at a lower price point. Like all Voyage 4 models, it shares the same embedding space, enabling asymmetric retrieval strategies.
Target Use Cases:
The default choice for most AI search and retrieval applications: semantic search, RAG pipelines, chatbots, recommendation systems, and content similarity detection. Ideal when you need strong retrieval quality without the premium cost of the large model.
Key Capabilities:
Balanced accuracy-cost-latency profile. Shared embedding space with the full Voyage 4 series. Matryoshka and quantization support for storage optimization. Inherently multilingual across 26+ languages.
Best For: Teams looking for a strong general-purpose embedding model that balances performance and cost for production workloads.
voyage-4-lite
Speed-Optimized — Lowest Latency and Cost
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | 1024 (default), 256, 512, 2048 |
| Pricing | $0.02 per 1M tokens |
| Free Tier | 200 million tokens |
Overview: voyage-4-lite is optimized for latency and cost while still delivering competitive retrieval quality. It approaches the retrieval accuracy of previous-generation models like voyage-3.5 while requiring significantly fewer parameters and computational resources. It is designed for high-throughput, latency-sensitive query workloads.
Target Use Cases:
High-volume query embedding in production pipelines where latency is critical. Real-time search applications, autocomplete systems, and interactive AI assistants. Query-side embedding in asymmetric retrieval setups (paired with voyage-4-large for document indexing).
Key Capabilities:
Significantly fewer parameters for fast inference. 6x cheaper than voyage-4-large. Shared embedding space enables mixing with heavier models. Supports the same Matryoshka dimensions and quantization options.
Best For: Latency-sensitive applications and high-throughput query embedding, especially as the query model in asymmetric retrieval with voyage-4-large as the document model.
voyage-4-nano
Open-Weight — Local Development and On-Device
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | 512 (default), 128, 256 |
| Pricing | Free (open-weight, Apache 2.0 license) |
| Free Tier | N/A — self-hosted |
Overview: voyage-4-nano is Voyage AI's first open-weight model, freely available on Hugging Face under the Apache 2.0 license. Despite its compact size, it outperforms much larger existing embedding models including voyage-3.5-lite. It shares the same embedding space as the rest of the Voyage 4 series, giving developers an easy path from local prototyping to production with larger models.
Target Use Cases:
Local development and prototyping without API calls. On-device or edge deployment for mobile and IoT applications. Offline environments where API access is unavailable. Cost-free experimentation during early development phases.
Key Capabilities:
Open-weight under Apache 2.0 with full Hugging Face support. Shared embedding space for seamless migration to production models. Matryoshka and quantization-aware training. Can be run on commodity hardware, vLLM, or SentenceTransformers.
Best For: Developers who need a local, free, self-hosted model for prototyping with a clear upgrade path to production Voyage 4 models.
Domain-Specific Text Embeddings
Embedding models are constrained in parameter count due to latency requirements (typically under 10 billion parameters). Domain-specific models dedicate that limited capacity to understanding the vocabulary, semantics, and retrieval patterns unique to their target domain. This specialization consistently yields meaningful accuracy gains over general-purpose models for domain-specific tasks.
voyage-code-3
Optimized for Code Retrieval and Documentation
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | 1024 (default), 256, 512, 2048 |
| Pricing | $0.18 per 1M tokens |
| Free Tier | 200 million tokens |
Overview: voyage-code-3 is the next-generation embedding model optimized specifically for code retrieval. It outperforms OpenAI-v3-large and CodeSage-large by an average of 13.80% and 16.81% across a comprehensive suite of 32 code retrieval datasets. Its 32K context length far exceeds alternatives (OpenAI at 8K, CodeSage at 1K), enabling it to handle large code files and lengthy documentation. It was evaluated across five categories of code retrieval tasks: text-to-code, code-to-code, docstring-to-code, real-world repository scenarios, and complex reasoning tasks.
Target Use Cases:
Code assistants and AI-powered IDEs. Code search across large repositories and monorepos. Documentation retrieval for developer tools. Code-to-code similarity detection (duplicate detection, refactoring suggestions). Function retrieval, SQL mapping, and code execution context for agentic coding systems.
Key Capabilities:
State-of-the-art code retrieval across 9+ programming languages. Matryoshka learning for flexible dimensions with minimal quality loss. Quantization options (int8, binary) for dramatically reduced storage costs. Binary rescoring for up to 4.25% additional retrieval quality improvement. 32K context for full-file and multi-file code understanding.
Best For: Engineering teams building code search, coding assistants, or agentic coding tools that need accurate code and documentation retrieval.
voyage-finance-2
Optimized for Finance Retrieval and RAG
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | 1024 |
| Pricing | $0.12 per 1M tokens |
| Free Tier | 50 million tokens |
Overview: voyage-finance-2 is a domain-specific embedding model trained for finance retrieval. It demonstrates superior retrieval quality on 11 financial retrieval datasets spanning financial news, public filings, financial advice, and financial reports—including TAT-QA, a dataset requiring numerical reasoning over hybrid tabular and textual data. It outperforms OpenAI (the next best model) by an average of 7% and Cohere by 12% on financial datasets.
Target Use Cases:
Financial document search and retrieval (SEC filings, earnings reports, prospectuses). Financial RAG applications for analyst tools and compliance systems. Financial news retrieval and sentiment analysis pipelines. Investment research tools requiring numerical reasoning across tables and text.
Key Capabilities:
Trained on massive financial corpora with domain-specific positive pairs. 32K context length handles long financial documents (far exceeding OpenAI's 8K and Cohere's 512). Specialized for hybrid tabular/textual financial data. Part of Voyage's domain-specific portfolio alongside law and code models.
Best For: Financial services teams building AI-powered research, compliance, risk analysis, or advisory tools that search over financial documents.
voyage-law-2
Optimized for Legal Retrieval and RAG
| Spec | Value |
|---|---|
| Context Length | 16,000 tokens |
| Dimensions | 1024 |
| Pricing | $0.12 per 1M tokens |
| Free Tier | 50 million tokens |
Overview: voyage-law-2 is a legal domain-specific embedding model that tops the MTEB leaderboard for legal retrieval by a significant margin. It outperforms OpenAI v3 large by 6% on average across eight legal retrieval datasets and by more than 10% on three of them. Trained on over 1 trillion high-quality legal tokens with specifically designed positive pairs and a novel contrastive learning algorithm, it excels at long-context legal document retrieval. Notably, because law intersects with many other domains (finance law, intellectual property, technology law), training data from other domains was mixed in, giving it strong cross-domain legal coverage.
Target Use Cases:
Legal document search across contracts, court cases, statutes, and regulations. Legal RAG applications for case research and compliance. Contract analysis tools and clause retrieval. Intellectual property and cross-domain legal retrieval. Long-document legal analysis where context length matters.
Key Capabilities:
Trained on 1T+ legal tokens with cross-domain coverage. Novel contrastive learning algorithm for legal semantics. 16K context handles lengthy legal documents. Top MTEB leaderboard performance for legal retrieval. Strong performance on general-purpose corpora as well, not just legal.
Best For: Legal teams, legal tech companies, and compliance organizations building AI-powered legal research, contract analysis, or regulatory document retrieval.
Contextualized Chunk Embeddings
Traditional embedding models process each chunk of a document in isolation, which means chunks lose the broader context of the full document they came from. This "context loss" problem is a fundamental limitation of standard RAG pipelines. Common workarounds—chunk overlaps, LLM-generated context summaries, and metadata augmentation—add complexity, cost, and maintenance burden. Contextualized chunk embeddings solve this natively at the model level.
voyage-context-3
Chunk-Level Detail with Global Document Context
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | 1024 (default), 256, 512, 2048 |
| Pricing | $0.18 per 1M tokens |
| Free Tier | 200 million tokens |
Overview: voyage-context-3 is a contextualized chunk embedding model that produces vectors encoding both the chunk's own content and the contextual information from the full document—without any manual metadata augmentation. The model sees all chunks at once and intelligently injects relevant global information into individual chunk embeddings. It outperforms OpenAI-v3-large by 14.24% on chunk-level retrieval and 7.89% on document-level retrieval, Cohere-v4 by 12.56% and 5.64% respectively, and Anthropic's contextual retrieval approach by 6.76% and 2.40% respectively. It serves as a seamless drop-in replacement for standard embeddings without any downstream workflow changes.
Target Use Cases:
RAG pipelines over long, unstructured documents where chunks lose critical context (legal contracts, financial filings, research papers, technical documentation). Document retrieval where queries depend on information spanning multiple sections. Any chunking-based retrieval system where you want better accuracy without adding LLM context-augmentation steps.
Key Capabilities:
Natively captures both chunk detail and document-level context. Drop-in replacement for standard embeddings—no workflow changes needed. Reduces sensitivity to chunking strategy choices. Simpler, faster, and cheaper than LLM-based context augmentation (e.g., contextual retrieval). Matryoshka dimensions and quantization-aware training for storage optimization. Binary 512-dim matches OpenAI (float, 3072-dim) in accuracy at 192x lower vector DB cost.
Best For: Teams building RAG systems over long documents who want better retrieval accuracy without the complexity of manual context augmentation pipelines.
Multimodal Embeddings
Multimodal embedding models transform unstructured data from multiple modalities—text, images, and video—into a shared vector space. Unlike models like CLIP that process each modality separately (which causes cross-modal bias where text vectors align with irrelevant text rather than relevant images), Voyage multimodal models process interleaved inputs through a single backbone, capturing the combined semantic meaning of mixed-modality content.
voyage-multimodal-3.5
Text + Image + Video in a Unified Embedding Space
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | 1024 (default), 256, 512, 2048 |
| Pricing | $0.12 per 1M tokens + $0.60 per 1B pixels |
| Free Tier | 200M text tokens + 150B pixels |
Overview: voyage-multimodal-3.5 is the latest multimodal embedding model that can vectorize interleaved text, images, and video through a single backbone. It is the first production-grade embedding model to support native video retrieval and Matryoshka embeddings for flexible dimensionality. It effectively processes content-rich visual data such as screenshots of PDFs, slides, tables, figures, and video clips—eliminating the need for complex text extraction or ETL pipelines. By processing all modalities together rather than separately, it reduces cross-modal bias and produces more semantically accurate representations.
Target Use Cases:
Searching across mixed-modality document collections (PDFs with tables and figures, slide decks, technical manuals with diagrams). Video content retrieval using natural language queries. Visual document understanding without OCR or text extraction. Multi-modal RAG pipelines that need to reason over both text and visual content. E-commerce product search combining product descriptions with images.
Key Capabilities:
First production-grade model with native video embedding support. Single-backbone architecture eliminates cross-modal bias. Processes interleaved text + images + video (not just separate modalities). Matryoshka learning for flexible dimensions (industry first for multimodal). Images processed between 50K–2M pixels with proportional pricing. Video frames treated as images for embedding and billing.
Best For: Teams building search or RAG over visually rich content (PDFs, slides, documentation with figures, video libraries) who want to avoid complex document parsing pipelines.
Rerankers
Rerankers are a fundamentally different type of model from embedding models. While embedding models encode queries and documents independently into vectors, rerankers are cross-encoders that jointly process a query-document pair to produce a relevance score. This joint processing enables more accurate relevance prediction at the cost of higher computational requirements. Rerankers are used as a second stage in retrieval pipelines: first-stage retrieval (using embeddings or BM25) retrieves candidate documents, then the reranker re-scores and re-orders them. Purpose-built rerankers like Voyage's have been shown to be up to 60x cheaper, 48x faster, and up to 15% more accurate than using LLMs as rerankers.
rerank-2.5
Highest Accuracy Reranker with Instruction-Following
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | N/A (relevance scoring, not embeddings) |
| Pricing | $0.05 per 1M tokens |
| Free Tier | 200 million tokens |
Overview: rerank-2.5 is the flagship reranker in the Voyage model suite, offering the highest reranking accuracy and an industry-first instruction-following capability. On 93 retrieval datasets spanning multiple domains, it improves retrieval accuracy by 7.94% over Cohere Rerank v3.5. Its instruction-following feature allows users to dynamically steer the reranking process with natural language instructions—such as specifying which document types to prioritize, which query components to emphasize, or what relevance criteria to apply. On the MAIR (Massive Instructed Retrieval) benchmark, it outperforms Cohere Rerank v3.5 by 12.70%. Its 32K context length is 8x that of Cohere's reranker, enabling reranking of much longer documents without truncation.
Target Use Cases:
Second-stage reranking in production retrieval and RAG pipelines. Instruction-steered retrieval where relevance criteria vary by context (e.g., prioritizing regulatory documents over court cases for legal queries, or prioritizing titles over abstracts for academic search). Complex multi-domain search where different queries benefit from different reranking strategies. Any retrieval system where accuracy on the final ranked results is paramount.
Key Capabilities:
Industry-first instruction-following reranker. 32K context length (8x Cohere, 2x rerank-2). Up to 60x cheaper and 48x faster than LLM-based reranking. 7.94% improvement over Cohere Rerank v3.5 across 93 datasets. Natively multilingual across 31+ languages. Advanced distillation from larger instruction-following models.
Best For: Production RAG systems and search applications where maximizing the accuracy of final results justifies a two-stage retrieval approach, especially when relevance criteria need to be dynamically controlled.
rerank-2.5-lite
Fast and Cost-Effective Reranking
| Spec | Value |
|---|---|
| Context Length | 32,000 tokens |
| Dimensions | N/A (relevance scoring, not embeddings) |
| Pricing | $0.02 per 1M tokens |
| Free Tier | 200 million tokens |
Overview: rerank-2.5-lite is optimized for latency-sensitive applications while maintaining strong reranking quality. It improves retrieval accuracy by 7.16% over Cohere Rerank v3.5—close to the full rerank-2.5—while being significantly faster and cheaper. It also supports instruction-following capabilities and the same 32K context length. On the MAIR benchmark, it outperforms Cohere Rerank v3.5 by 10.36%.
Target Use Cases:
Latency-sensitive reranking in real-time search applications. High-volume reranking pipelines where cost matters. Interactive applications (chatbots, autocomplete, real-time assistants) that need fast second-stage refinement. Production environments where total token volume per request should stay under 200K for optimal latency.
Key Capabilities:
32K context length with instruction-following. 7.16% improvement over Cohere Rerank v3.5. Significantly lower latency than rerank-2.5. 2.5x cheaper than the full model. Same multilingual support across 31+ languages.
Best For: Latency-sensitive production systems that need reranking quality without the latency budget of the full rerank-2.5 model.
Legacy Models (Still Accessible)
The following older models remain accessible via the API for backward compatibility but are not recommended for new projects. The current-generation models outperform them in all aspects: quality, context length, latency, and throughput. No free tokens are offered for legacy models.
| Model | Description | Specs |
|---|---|---|
| voyage-3-large | Previous-generation general-purpose text embedding model. Superseded by the Voyage 4 series which offers shared embedding spaces and better accuracy. | 32K tokens, 1024 dim (default) |
| voyage-3.5 | Previous-generation general-purpose text embedding optimized for quality. Replaced by voyage-4. | 32K tokens, 1024 dim (default) |
| voyage-3.5-lite | Previous-generation text embedding optimized for latency and cost. Replaced by voyage-4-lite. | 32K tokens, 1024 dim (default) |
| voyage-code-2 | Previous-generation code retrieval model. Replaced by voyage-code-3 which offers 13.80% better accuracy. | 16K tokens, 1536 dim |
| voyage-multimodal-3 | Previous-generation multimodal model supporting text and images (no video). Replaced by voyage-multimodal-3.5. | 32K tokens, 1024 dim |
| rerank-2 | Previous-generation reranker. Replaced by rerank-2.5 which adds instruction-following and doubles context length. | 16K context |
| rerank-2-lite | Previous-generation lite reranker. Replaced by rerank-2.5-lite with 4x more context length. | 8K context |
Quick Model Selection Guide
For text embeddings, the Voyage AI documentation recommends: voyage-4-large for the best quality, voyage-4-lite for the lowest latency and cost, voyage-4 for a balance between quality and performance, or a domain-specific model if your application is in code, finance, or legal. For multimodal needs, use voyage-multimodal-3.5. For chunk-level retrieval with document context, use voyage-context-3. For reranking, use rerank-2.5 for most applications or rerank-2.5-lite for latency-sensitive workloads.
Sources
- MongoDB Voyage AI Documentation
- Voyage 4 Blog Post
- voyage-code-3 Blog Post
- voyage-context-3 Blog Post
- rerank-2.5 Blog Post
- voyage-finance-2 Blog Post
- voyage-law-2 Blog Post
- voyage-multimodal-3.5 Announcement
Terms
API (Application Programming Interface)
A set of rules and protocols that lets software applications communicate with each other. In this context, you send your text to Voyage AI's API and it sends back vector embeddings.
Asymmetric Retrieval
A retrieval strategy where you use one model to embed your documents (often a larger, more accurate model) and a different model to embed your queries (often a faster, cheaper model). The Voyage 4 shared embedding space makes this possible because the vectors from different Voyage 4 models are compatible with each other.
Automated Embedding
A MongoDB feature that automatically generates and stores vector embeddings using Voyage AI whenever data is inserted, updated, or queried — eliminating the need to build and manage a separate embedding pipeline.
BM25
A traditional keyword-based search algorithm that ranks documents by how frequently and prominently your search terms appear in them. Unlike semantic search, BM25 only matches exact words and doesn't understand meaning. Often used as a first-stage retrieval method before reranking.
Chunking
The process of breaking a large document into smaller pieces (called "chunks") so that each piece can be individually embedded and searched. For example, a 50-page contract might be split into paragraph-sized chunks. The challenge is that individual chunks can lose important context from the rest of the document.
Context Length
The maximum amount of text (measured in tokens) that a model can process in a single request. A model with a 32,000-token context length can handle roughly 24,000 words at once. Longer context lengths allow the model to process larger documents without cutting them off.
Context Loss
The problem that occurs when a document is broken into chunks and each chunk is embedded independently. A chunk may reference "the company" or "the defendant" without containing the information about which company or defendant is being discussed, because that information was in a different chunk.
Contrastive Learning
A training technique where a model learns to pull similar items closer together in vector space and push dissimilar items farther apart. For example, a query and its correct answer should have similar vectors, while a query and an irrelevant document should have very different vectors.
Cosine Similarity
A mathematical measure of how similar two vectors are, based on the angle between them. A cosine similarity of 1.0 means two vectors point in exactly the same direction (very similar meaning), while 0 means they are completely unrelated. This is the standard way to compare embeddings.
Cross-Encoder
A type of model architecture (used by rerankers) that processes a query and a document together as a single input, allowing it to understand the relationship between them more deeply than models that process query and document separately. More accurate but slower than bi-encoders.
Cross-Modal Bias
A problem in multimodal search where the model tends to match items of the same type (text to text, image to image) rather than matching across types (text to a relevant image). Voyage's single-backbone architecture is designed to reduce this bias.
Dense Model
A neural network where every parameter is active for every input. Contrast with Mixture-of-Experts models where only a subset of parameters are active per input. Dense models are simpler but more computationally expensive at the same parameter count.
Dimensions
The number of individual numbers in a vector embedding. A 1024-dimensional embedding is an array of 1024 numbers. Higher dimensions can capture more nuance about meaning but require more storage space and memory. Think of it like describing a color: you could use 3 dimensions (red, green, blue) or thousands of dimensions for extremely fine distinctions.
Distillation
A training technique where a smaller, faster model (the "student") is trained to mimic the behavior of a larger, more accurate model (the "teacher"). This lets the smaller model inherit much of the larger model's quality while running much faster and cheaper.
Embedding / Vector Embedding
An array of numbers (a vector) that represents the meaning of a piece of text, image, or video. Similar content produces similar vectors, enabling computers to understand and compare meaning mathematically rather than just matching keywords.
Embedding Model
A machine learning model that converts input data (text, images, video) into vector embeddings. The model has learned from vast amounts of data to capture semantic meaning — so "happy" and "joyful" produce similar vectors even though they share no letters.
Embedding Space
The mathematical space where all vectors from a given model live. Vectors that are close together in this space represent content with similar meaning. A "shared embedding space" means vectors from different models can be compared directly — they all live in the same coordinate system.
ETL Pipeline (Extract, Transform, Load)
A data processing workflow that extracts data from a source, transforms it into a usable format, and loads it into a destination system. In the context of multimodal embeddings, Voyage AI can eliminate the need for ETL by directly processing mixed content (text + images) without pre-extraction.
First-Stage Retrieval
The initial step in a two-stage search system where a fast method (like vector search or BM25) retrieves a broad set of potentially relevant documents. These candidates are then refined by a reranker in the second stage.
Hallucination
When an AI model generates information that sounds plausible but is factually incorrect or made up. RAG (Retrieval-Augmented Generation) helps reduce hallucinations by grounding the model's responses in real, retrieved data.
Hugging Face
An open-source platform and community for sharing machine learning models, datasets, and tools. Voyage AI's open-weight model (voyage-4-nano) is available for download on Hugging Face.
Hybrid Search
A search approach that combines keyword-based search (like BM25) with semantic/vector search to get the benefits of both: exact term matching and meaning-based matching.
Inference
The process of running data through a trained model to get a result (e.g., generating an embedding or a relevance score). "Inference latency" is how long this takes.
Input Type (query vs. document)
A parameter you set when calling Voyage AI models to indicate whether the text you're embedding is a search query or a document to be searched. The model prepends a different internal prompt for each type, producing vectors better optimized for retrieval.
Interleaved Input
Content that mixes multiple types of data together in sequence — for example, a document that alternates between text paragraphs, images, and tables. Voyage multimodal models can process these mixed inputs directly, unlike CLIP-style models that require each modality to be processed separately.
Latency
The time delay between sending a request and receiving a response. In search applications, lower latency means faster results for the user. Measured in milliseconds.
LLM (Large Language Model)
A large AI model trained on vast amounts of text that can understand and generate human language. Examples include GPT, Claude, and Gemini. In the context of this guide, LLMs are the models that generate responses in RAG systems, while embedding models and rerankers handle the retrieval.
Matryoshka Learning / Matryoshka Representation Learning (MRL)
A training technique (named after Russian nesting dolls) that produces embeddings where the first N dimensions are also a valid, meaningful embedding at that smaller size. This means you can generate one 2048-dimension embedding and then use just the first 256, 512, or 1024 dimensions without re-running the model — trading a small amount of accuracy for significant storage savings.
Mixture-of-Experts (MoE)
A model architecture where instead of one large network processing every input, there are multiple specialized "expert" sub-networks, and a routing mechanism selects which experts to activate for each input. This achieves the accuracy of a very large model while only using a fraction of the compute per request. voyage-4-large uses this architecture.
Modality
A type or form of data. Text, images, audio, and video are all different modalities. A "multimodal" model can process multiple modalities.
MongoDB Atlas
MongoDB's fully managed cloud database platform. Atlas includes features like Vector Search, the Embedding and Reranking API, and other services for building AI applications.
MTEB (Massive Text Embedding Benchmark)
A widely used benchmark for evaluating the quality of text embedding models across many different tasks and datasets. Being highly ranked on MTEB indicates strong general-purpose embedding quality.
Multilingual
The ability to work across multiple human languages. Voyage AI models are inherently multilingual, meaning they capture semantic similarity regardless of what language the text is written in — a query in English can retrieve a relevant document in Japanese.
NDCG@10 (Normalized Discounted Cumulative Gain at 10)
A standard metric for measuring retrieval quality. It evaluates how well a system ranks the most relevant documents in the top 10 results, giving more credit for placing highly relevant documents at the very top. Scores range from 0 to 1, with higher being better.
OCR (Optical Character Recognition)
Technology that extracts machine-readable text from images of documents, scanned pages, or photographs. Multimodal embedding models like voyage-multimodal-3.5 can often eliminate the need for OCR by directly understanding the visual content.
Open-Weight Model
A model whose trained parameters (weights) are publicly released, allowing anyone to download and run it on their own hardware. voyage-4-nano is open-weight under the Apache 2.0 license.
Parameters
The internal numbers in a neural network that are learned during training and determine how the model processes inputs. More parameters generally means a model can capture more complexity, but also means higher compute costs and latency.
Quantization
A technique that reduces the precision of the numbers in a vector embedding to save storage and memory. For example, converting from 32-bit floating-point numbers (4 bytes per dimension) to 8-bit integers (1 byte per dimension) reduces storage by 4x, or to binary (1 bit per dimension) reduces it by 32x — with only a small loss in accuracy.
RAG (Retrieval-Augmented Generation)
An architecture pattern where, before an LLM generates a response, a retrieval system first searches for relevant documents and provides them as context. This grounds the LLM's response in real data, reducing hallucinations and enabling the model to answer questions about your specific data.
Re-indexing
The process of regenerating all vector embeddings for your document collection, typically required when you switch to a different embedding model. The Voyage 4 shared embedding space eliminates this need when switching between Voyage 4 models.
Reranker / Reranking Model
A model that takes a query and a set of already-retrieved candidate documents and re-scores them for relevance, producing a more accurate ranking. Rerankers are more computationally expensive per document than embedding-based search, so they're applied only to the top candidates (e.g., the top 100) from a first-stage retrieval.
Retrieval
The process of finding and returning relevant documents or information from a collection in response to a query. This is the core problem that embedding models and rerankers are designed to solve.
Retrieval Accuracy
A measure of how well a search system returns genuinely relevant results. Higher retrieval accuracy means the system more consistently surfaces the right documents for a given query.
RTEB (Retrieval Text Embedding Benchmark)
A public leaderboard for evaluating and comparing the retrieval quality of different embedding models across standardized datasets.
Semantic Search
A search method that finds results based on meaning rather than exact keyword matches. "Best pizza nearby" would find results about "top-rated pizzerias in your area" even though the words are completely different. Powered by vector embeddings and vector search.
Shared Embedding Space
A property of the Voyage 4 model series where all four models (large, standard, lite, nano) produce vectors that live in the same mathematical space and can be directly compared. This means you can embed documents with one model and search them with a different one.
Text Embedding
A vector embedding generated specifically from text input. A text embedding model reads a sentence, paragraph, or document and converts it into an array of numbers that captures the semantic meaning of that text. Two pieces of text with similar meaning will produce similar text embeddings, even if they use completely different words. This is the most common type of embedding and is what the majority of Voyage AI models produce.
Throughput
The volume of work a system can process in a given time period. For embedding models, this is often measured in tokens per minute (TPM). Higher throughput means more documents can be embedded faster.
Tokens
The fundamental units that models use to process text. A token is roughly ¾ of a word in English — so 32,000 tokens is approximately 24,000 words. Tokens are also the unit of billing for API usage.
Two-Stage Retrieval
A search architecture with two steps: (1) a fast first stage that broadly retrieves candidate documents using embeddings or keyword search, and (2) a slower but more accurate second stage that uses a reranker to re-order those candidates by relevance. This balances speed with accuracy.
Vector
An ordered list of numbers. In the context of embeddings, a vector like [0.12, -0.45, 0.78, ...] represents the meaning of a piece of content. Each number captures a different learned aspect of the content's meaning.
Vector Database
A database optimized for storing and searching vector embeddings. MongoDB Atlas includes vector search capabilities that let you store embeddings alongside your regular data and perform similarity searches.
Vector Search
A search method that finds the vectors (and their associated documents) most similar to a query vector. Instead of matching keywords, it finds documents whose meaning is closest to the query's meaning by measuring distance in vector space.
Vectorize
The process of converting data (text, images, video) into vector embeddings using an embedding model.