Best vector databases for RAG in 2026: stop benchmarking, match the deployment shape
Search "best vector database" and you get a dozen lists arguing about recall and queries-per-second, usually written by the database that wins its own benchmark. That argument is two years stale. In 2026, hybrid search and metadata filtering are table-stakes across every serious option, and the engines are fast enough that raw benchmark differences rarely decide a real retrieval-augmented-generation (RAG) project. The decision that actually matters is deployment shape: does the database run inside your process, on a server you operate, on a server the vendor operates, or on no servers at all? Match that to where your data and your ops capacity already sit, and the shortlist picks itself. This guide places 11 databases into four shapes, compares them on the five axes that decide fit and cost, and gives an honest answer most vendor lists will not: for a lot of teams, the right vector database is the one your data already lives in. Jump to the comparison matrix or the decision guide.
- The choice is deployment shape, not benchmark. Embedded, self-hosted, managed, or serverless. Hybrid search and filtering are now standard, so pick the shape that fits your data gravity and ops capacity, then compare within it.
- Already on Postgres? Start with pgvector. For most apps under roughly 10 million vectors it is the highest-ROI choice: no new system, transactional consistency with your real data, filtering in plain SQL. Reach for a dedicated database when you outgrow it, not before.
- Best open-source, self-hosted performance: Qdrant. Rust-built, fast filtered search, Apache-2.0, and the cheapest to self-host. Best native hybrid search: Weaviate (BSD-3, BM25 built in). Best at billion-scale: Milvus.
- Best zero-ops managed: Pinecone (serverless, proprietary) for teams that want no infrastructure at all; turbopuffer if your workload is storage-heavy and you want object-storage economics.
- Best embedded / local-first: Chroma for prototyping and LanceDB for multimodal and edge. License watch: Weaviate is BSD-3, Qdrant/Milvus/Chroma/LanceDB/Vespa/Marqo are Apache-2.0, and Redis is now a three-license choice (RSALv2/SSPLv1/AGPLv3), not plain open source.
The four deployment shapes, and the data-gravity rule
The fastest way to a shortlist is to stop comparing engines and start comparing shapes. A vector database can run in one of four operational forms, and each asks a different amount of operational work from you. Pick the shape first, then the field narrows to two or three real candidates.
Embedded
The database runs inside your application process, like SQLite for vectors. No server, no network hop, no ops. Best for prototypes, local-first apps, notebooks, and edge. The ceiling is single-node scale.
Chroma ยท LanceDB
Self-Hosted
You run the server on your own infrastructure. Maximum control, lowest unit cost at scale, full data residency, and the ops burden is yours. The open-source workhorses live here.
Qdrant ยท Weaviate ยท Milvus ยท Vespa ยท Marqo
Managed
The vendor runs the same open-source engine for you, and you keep most of the control without the on-call pager. Usually the official cloud for a self-hosted database (Zilliz for Milvus, Weaviate Cloud, Qdrant Cloud, Redis Cloud).
Zilliz ยท Weaviate Cloud ยท Qdrant Cloud ยท Redis Cloud
Serverless
You hold no servers and no idle capacity; you pay per use, and the vendor abstracts the cluster away entirely. Lowest ops, least control and portability, and the storage-versus-memory split now drives the bill.
Pinecone ยท turbopuffer
Why hybrid search and filtering stopped being the differentiator
The single biggest change since the 2024 vector-database hype is that the features vendors used to fight over are now standard. Hybrid search, combining dense vector similarity with sparse keyword relevance (usually BM25), materially improves RAG retrieval when exact terms matter, and it has moved from a selling point to an expectation. Weaviate has BM25 hybrid built in, Qdrant and Milvus do sparse-plus-dense, Pinecone has supported sparse-dense for years, and turbopuffer and Vespa treat full-text and vector as one query. If a database cannot do hybrid in 2026, that is the news, not the reverse.
Metadata filtering followed the same path. Filtering vector results by attributes (tenant, date, document type, permissions) is now expected everywhere; the differences are in how well filters combine with the vector index under load. This is the one place a benchmark still earns its keep: Qdrant's Rust filtering is a genuine strength for heavily filtered workloads, and pgvector inherits Postgres's mature query planner for complex predicates. But "can it filter" is no longer a yes/no axis, so the matrix below treats hybrid and filtering as near-universal and spends its discrimination on the axes that still separate the field: deployment shape, open-source license, and pricing model.
The comparison matrix: shape, license, hybrid and pricing model
Eleven databases on the five axes that decide fit and cost. yes, partial, and no are read from each vendor's own repository, docs, or pricing page. "Partial" hybrid means available through an integration or full-text add-on rather than a single native call. Pricing model is the column to read at scale, not a headline number. This comparison is published as an open dataset under CC-BY (see methodology).
| Database | Deployment shape | Open source | Hybrid search | Filtering | Pricing model |
|---|---|---|---|---|---|
| pgvector | Extension (in your Postgres) | PostgreSQL License | partial (Postgres FTS) | yes (SQL) | Free (pay your Postgres) |
| Qdrant | Self-host + managed | Apache-2.0 | yes (sparse+dense) | yes (strong, Rust) | Usage / free tier |
| Weaviate | Self-host + managed | BSD-3-Clause | yes (native BM25) | yes | Resource-based |
| Milvus / Zilliz | Self-host + managed | Apache-2.0 | yes (2.4+) | yes | Free OSS / usage (Zilliz) |
| Pinecone | Serverless (managed) | proprietary | yes (sparse-dense) | yes | Usage (serverless) |
| Chroma | Embedded + cloud | Apache-2.0 | partial (integration) | yes | Free OSS / cloud usage |
| LanceDB | Embedded + cloud | Apache-2.0 | partial (FTS) | yes | Free OSS / cloud |
| turbopuffer | Serverless (managed) | proprietary | yes (BM25+vector) | yes | Usage (object-storage) |
| Vespa | Self-host + managed | Apache-2.0 | yes (native ranking) | yes | Free OSS / resource (Cloud) |
| Marqo | Self-host + cloud | Apache-2.0 | yes | yes | Free OSS / cloud |
| Redis (Query Engine) | Self-host + managed | RSALv2 / SSPLv1 / AGPLv3 | yes (vector + text) | yes | Free OSS / Redis Cloud |
Highlighted rows are the picks most teams should start from. Licenses, hybrid support, and pricing models reflect each vendor's public repository and documentation as of June 2026 and change often; verify on the vendor's own page before a decision. Redis is marked partial under "open source" because version 8 ships a tri-license (RSALv2/SSPLv1/AGPLv3) rather than a single permissive license.
This comparison is published as an open dataset (CC-BY) with a permanent DOI: DOI 10.5281/zenodo.20738950. Browse the full dataset landing page or download the machine-readable JSON.
The self-hosted workhorses: Qdrant, Weaviate, Milvus, Vespa, Marqo
This is where most serious RAG teams land, because self-hosting gives the lowest unit cost, full data residency, and no per-vector vendor markup, at the price of running the server yourself. Qdrant is the performance-and-value pick: written in Rust, Apache-2.0, with the fastest filtered-search numbers in the category and the cheapest self-host footprint (it runs millions of vectors on a small VPS). It is the safe default when you want an open-source dedicated database you can also buy as a managed cloud later. Weaviate is the choice when hybrid search and a batteries-included feature set matter most: BSD-3-licensed, native BM25 hybrid, a module ecosystem for embeddings and rerankers, and Weaviate Cloud's entry tier is among the cheapest managed options. Milvus (with Zilliz as its managed cloud) is the billion-scale answer: a distributed, Apache-2.0 architecture built for hundreds of millions to billions of vectors that no single-node option matches, with the operational weight that implies.
Vespa is the heavyweight for teams whose problem is really search-and-ranking, not just vector lookup: an Apache-2.0 engine (out of Yahoo) that unifies vector, text, and structured data with rich, ML-driven ranking, powerful and correspondingly complex to operate. Marqo is the end-to-end option that folds embedding generation and storage into one Apache-2.0 system, which removes a moving part for teams that do not want to run a separate embedding pipeline. The honest split: pick Qdrant for fast filtered search at low cost, Weaviate for hybrid and modules, Milvus for raw scale, Vespa when ranking is the hard part, and Marqo when you want the embedding step handled for you.
The managed and serverless end: Pinecone and turbopuffer
If you would rather pay to make the database someone else's problem, this is the end of the spectrum, and the trade is control and portability for zero operations. Pinecone is the category's default fully-managed serverless option: proprietary, no servers to run, sparse-dense hybrid, and a serverless model that separates storage from compute so idle indexes cost little. It is the right pick for a team that wants to ship RAG without hiring anyone to run infrastructure, accepting closed-source lock-in and per-usage pricing as the cost. turbopuffer is the newer, cost-shaped challenger: an object-storage-first serverless engine that, by its own account, stores vectors on S3-class storage rather than RAM to cut cost dramatically for storage-heavy and many-namespace workloads. It is proprietary with no free tier and a monthly minimum, and it shines specifically when you have a lot of vectors that are queried unevenly. Choose Pinecone for the mature zero-ops default; choose turbopuffer when storage economics dominate your bill and you can live on a managed, closed platform.
The embedded and already-have-it options: Chroma, LanceDB, pgvector, Redis
For prototypes, local-first apps, and teams that should not add a database at all, the best answer often runs where you already are. Chroma is the prototyping default: Apache-2.0, embedded, dead-simple to start in a notebook, with a managed Chroma Cloud when you outgrow local. LanceDB is the embedded option for multimodal and edge: Apache-2.0, built on the Lance columnar format, runs in-process or in your cloud, and handles vectors alongside the raw data on object storage, which suits image and video RAG and offline apps.
The two "already-have-it" options are the ones most lists undersell. pgvector turns Postgres into a vector database with an extension under the PostgreSQL License: if your app data is already in Postgres, you get vector search with transactional consistency, mature SQL filtering, and zero new infrastructure, which is why it is the right first move for the majority of apps under roughly ten million vectors. Redis (via its Query Engine) adds vector search to a store many teams already run for caching, with very low latency, though note the 2025 license change: Redis 8 ships a tri-license (RSALv2, SSPLv1, or AGPLv3 at your option), so it is no longer plain-permissive open source and that may matter for some commercial deployments. The pattern in both cases is the data-gravity rule made concrete: adding vectors to a system you already operate beats standing up a dedicated database you do not yet need.
Which database for your situation
Match the move to where you actually are, not to the leaderboard. The shortlist falls out of the shape and your data gravity:
Already on Postgres
An app with its data in Postgres, under ~10M vectors.
pgvector in your existing database. No new system; SQL filtering; revisit only when you outgrow it.
Prototyping / solo
Notebook or local-first app, want to ship fast.
Chroma embedded (or LanceDB for multimodal). Move to a managed tier when you scale.
Self-hosting at scale
Real traffic, data-residency needs, an ops team.
Qdrant for fast filtered search at low cost; Weaviate if hybrid + modules matter; Milvus past ~100M vectors.
Zero-ops, no infra hires
Want RAG in production without running servers.
Pinecone serverless. turbopuffer if your bill is dominated by storage-heavy, uneven query load.
Search is the hard part
Complex ranking over text, vectors, and structured data.
Vespa for ML-driven ranking; Marqo if you also want embedding generation handled.
Already on Redis
Redis is in your stack for cache or queues.
Redis Query Engine for low-latency vectors, after checking the tri-license fits your deployment.
The honest verdicts
Best first move for most teams
already-have-itpgvector -- if your data is in Postgres and you are under ~10M vectors, this is the highest-ROI choice: vector search with transactional consistency and SQL filtering, zero new infrastructure. Graduate to a dedicated database when you actually outgrow it.
Best open-source dedicated database
self-hostedQdrant -- Rust-built, Apache-2.0, fastest filtered search and cheapest to self-host, with a managed cloud when you want it. Weaviate is the pick when native hybrid and modules matter more than raw filter speed.
Best at billion-scale
self-hosted / managedMilvus / Zilliz -- a distributed architecture built for hundreds of millions to billions of vectors that single-node options cannot match. The trade is real operational weight; do not reach for it before you need the scale.
Best zero-ops managed
serverlessPinecone for the mature, fully-managed serverless default; turbopuffer when object-storage economics dominate your bill. Both are proprietary, which is the cost of handing off the operations.
Best embedded / local-first
embeddedChroma for fast prototyping and LanceDB for multimodal and edge. Both are Apache-2.0 and run in-process, with managed tiers when you outgrow a single node.
Methodology and conflict disclosure
- Sample
- 11 vector databases spanning four deployment shapes, selected for RAG relevance and category coverage, not for who pays.
- Criteria
- Deployment shape, open-source license, native hybrid search, metadata filtering, and pricing model. Each cell is read from the vendor's own repository, documentation, or pricing page.
- Hybrid / filtering
- Marked "yes" only where a single native query path is documented; "partial" denotes availability through an integration or full-text add-on. Filtering is near-universal, so the matrix spends its discrimination on shape, license, and pricing.
- Performance claims
- Vendor benchmark and cost claims (for example object-storage savings) are attributed to the vendor, not asserted here as independent results. No first-party recall or latency benchmark is claimed.
- Conflicts
- The shape model and the rankings were fixed before any monetization check. Nesyona has no paid placement, no sponsorship from any database listed, and no affiliate relationship that altered the order. An outbound link may be tagged where a public program exists; it does not change a placement.
- Last verified
- June 2026. This category reprices and relicenses fast (see Redis); verify any license or pricing detail on the vendor's own page before deciding.
- Compiled by
- Vincent Wesley Couey, against public repositories, documentation, and pricing pages.
If you build one of these databases and want to check that we have represented it fairly, or that a license or pricing detail has moved, we would genuinely welcome the correction. The goal of this page is to be the one map of the category that is not written by a database ranking itself first.
This is the retrieval layer that feeds the rest of your AI stack. Once your vector store is chosen, the next decisions live one layer up: the LLMOps stack (gateway, observability, evaluation, and guardrails around the model), the best AI agent frameworks that orchestrate retrieval, and the best AI coding assistants for the team building it.
Frequently asked questions
What is the best vector database for RAG in 2026?
Do I really need a dedicated vector database, or can I use pgvector?
What is hybrid search, and does every vector database support it?
Qdrant vs Weaviate vs Pinecone: how do I choose?
Which vector databases are open source, and what changed with Redis?
What is an embedded vector database and when should I use one?
The bottom line
A vector database is not a leaderboard you win on recall; it is a deployment decision you match to your data. Hybrid search and filtering are standard now, so the engines are closer than the benchmarks suggest, and the real question is which of the four shapes (embedded, self-hosted, managed, serverless) fits your data gravity and ops capacity. Start where your data already lives: pgvector if you are on Postgres, Redis Query Engine if you run Redis, an embedded Chroma or LanceDB for prototypes. Reach for a dedicated database when you outgrow that, and pick by shape: Qdrant for fast self-hosted value, Weaviate for hybrid, Milvus for billion-scale, Pinecone or turbopuffer for zero-ops. The cheapest correct answer is almost always the system you do not have to add.
Sources
- pgvector repository and license, github.com/pgvector/pgvector (PostgreSQL License, accessed June 2026).
- Qdrant repository and pricing, github.com/qdrant/qdrant and qdrant.tech/pricing (Apache-2.0, usage plus perpetual free tier).
- Weaviate repository and documentation, github.com/weaviate/weaviate and weaviate.io (BSD-3-Clause, native BM25 hybrid search).
- Milvus and Zilliz documentation, milvus.io and zilliz.com (Apache-2.0, distributed billion-scale, sparse-dense hybrid from 2.4).
- Pinecone documentation and pricing, pinecone.io (serverless, storage-compute separation, sparse-dense hybrid).
- Chroma and LanceDB repositories, github.com/chroma-core/chroma and github.com/lancedb/lancedb (Apache-2.0, embedded).
- turbopuffer documentation, turbopuffer.com (object-storage-first serverless; cost-savings figures per turbopuffer's own statements).
- Vespa repository and documentation, github.com/vespa-engine/vespa and vespa.ai (Apache-2.0, unified ranking).
- Marqo documentation, marqo.ai (Apache-2.0, end-to-end embedding and storage).
- Redis repository and licensing notice, github.com/redis/redis (Redis 8 tri-license RSALv2/SSPLv1/AGPLv3) and Redis Query Engine vector documentation.