Updated June 2026 ยท 16 min read ยท Compiled and verified by Vincent Wesley Couey against each database's own repository, documentation, and pricing page

Best vector databases for RAG in 2026: stop benchmarking, match the deployment shape

Search "best vector database" and you get a dozen lists arguing about recall and queries-per-second, usually written by the database that wins its own benchmark. That argument is two years stale. In 2026, hybrid search and metadata filtering are table-stakes across every serious option, and the engines are fast enough that raw benchmark differences rarely decide a real retrieval-augmented-generation (RAG) project. The decision that actually matters is deployment shape: does the database run inside your process, on a server you operate, on a server the vendor operates, or on no servers at all? Match that to where your data and your ops capacity already sit, and the shortlist picks itself. This guide places 11 databases into four shapes, compares them on the five axes that decide fit and cost, and gives an honest answer most vendor lists will not: for a lot of teams, the right vector database is the one your data already lives in. Jump to the comparison matrix or the decision guide.

Last reviewed: June 2026 Next review: December 2026
Bottom line up front
4
Deployment shapes: embedded / self-hosted / managed / serverless
11
Databases placed and compared across the four shapes
5
Comparison axes: shape, license, hybrid search, filtering, pricing model
~10M
Vector count below which pgvector is usually enough
2026
Year hybrid search and filtering became table-stakes, not a differentiator
THE FOUR DEPLOYMENT SHAPES -- OPS YOU OWN, LEFT TO RIGHT 1 ยท Embedded runs in your process Chroma ยท LanceDB 2 ยท Self-Hosted you run the server Qdrant ยท Weaviate Milvus ยท Vespa ยท Marqo 3 ยท Managed vendor runs your server Zilliz ยท Redis Cloud 4 ยท Serverless you hold no servers Pinecone ยท turbopuffer pgvector sits across 1-3: it is whatever your Postgres already is. Data gravity rule: prefer the shape your data and ops already occupy. Hybrid search + filtering are standard across all four -- they no longer decide the pick.

The four deployment shapes, and the data-gravity rule

The fastest way to a shortlist is to stop comparing engines and start comparing shapes. A vector database can run in one of four operational forms, and each asks a different amount of operational work from you. Pick the shape first, then the field narrows to two or three real candidates.

1

Embedded

The database runs inside your application process, like SQLite for vectors. No server, no network hop, no ops. Best for prototypes, local-first apps, notebooks, and edge. The ceiling is single-node scale.

Chroma ยท LanceDB

2

Self-Hosted

You run the server on your own infrastructure. Maximum control, lowest unit cost at scale, full data residency, and the ops burden is yours. The open-source workhorses live here.

Qdrant ยท Weaviate ยท Milvus ยท Vespa ยท Marqo

3

Managed

The vendor runs the same open-source engine for you, and you keep most of the control without the on-call pager. Usually the official cloud for a self-hosted database (Zilliz for Milvus, Weaviate Cloud, Qdrant Cloud, Redis Cloud).

Zilliz ยท Weaviate Cloud ยท Qdrant Cloud ยท Redis Cloud

4

Serverless

You hold no servers and no idle capacity; you pay per use, and the vendor abstracts the cluster away entirely. Lowest ops, least control and portability, and the storage-versus-memory split now drives the bill.

Pinecone ยท turbopuffer

The data-gravity rule: prefer the shape your data and operations already occupy. If your application data lives in Postgres, the cheapest correct answer is usually pgvector in that same Postgres, not a new database to sync, secure, and pay for. If you already run Redis, its query engine adds vectors without a new system. A dedicated vector database earns its place when you outgrow that gravity well: tens of millions of vectors, heavy hybrid workloads, or scale and latency targets your primary store cannot hold. Choosing by data gravity beats choosing by benchmark, because the benchmark gap is usually smaller than the operational cost of a system you did not need.

Why hybrid search and filtering stopped being the differentiator

The single biggest change since the 2024 vector-database hype is that the features vendors used to fight over are now standard. Hybrid search, combining dense vector similarity with sparse keyword relevance (usually BM25), materially improves RAG retrieval when exact terms matter, and it has moved from a selling point to an expectation. Weaviate has BM25 hybrid built in, Qdrant and Milvus do sparse-plus-dense, Pinecone has supported sparse-dense for years, and turbopuffer and Vespa treat full-text and vector as one query. If a database cannot do hybrid in 2026, that is the news, not the reverse.

Metadata filtering followed the same path. Filtering vector results by attributes (tenant, date, document type, permissions) is now expected everywhere; the differences are in how well filters combine with the vector index under load. This is the one place a benchmark still earns its keep: Qdrant's Rust filtering is a genuine strength for heavily filtered workloads, and pgvector inherits Postgres's mature query planner for complex predicates. But "can it filter" is no longer a yes/no axis, so the matrix below treats hybrid and filtering as near-universal and spends its discrimination on the axes that still separate the field: deployment shape, open-source license, and pricing model.

The comparison matrix: shape, license, hybrid and pricing model

Eleven databases on the five axes that decide fit and cost. yes, partial, and no are read from each vendor's own repository, docs, or pricing page. "Partial" hybrid means available through an integration or full-text add-on rather than a single native call. Pricing model is the column to read at scale, not a headline number. This comparison is published as an open dataset under CC-BY (see methodology).

DatabaseDeployment shapeOpen sourceHybrid searchFilteringPricing model
pgvectorExtension (in your Postgres)PostgreSQL Licensepartial (Postgres FTS)yes (SQL)Free (pay your Postgres)
QdrantSelf-host + managedApache-2.0yes (sparse+dense)yes (strong, Rust)Usage / free tier
WeaviateSelf-host + managedBSD-3-Clauseyes (native BM25)yesResource-based
Milvus / ZillizSelf-host + managedApache-2.0yes (2.4+)yesFree OSS / usage (Zilliz)
PineconeServerless (managed)proprietaryyes (sparse-dense)yesUsage (serverless)
ChromaEmbedded + cloudApache-2.0partial (integration)yesFree OSS / cloud usage
LanceDBEmbedded + cloudApache-2.0partial (FTS)yesFree OSS / cloud
turbopufferServerless (managed)proprietaryyes (BM25+vector)yesUsage (object-storage)
VespaSelf-host + managedApache-2.0yes (native ranking)yesFree OSS / resource (Cloud)
MarqoSelf-host + cloudApache-2.0yesyesFree OSS / cloud
Redis (Query Engine)Self-host + managedRSALv2 / SSPLv1 / AGPLv3yes (vector + text)yesFree OSS / Redis Cloud

Highlighted rows are the picks most teams should start from. Licenses, hybrid support, and pricing models reflect each vendor's public repository and documentation as of June 2026 and change often; verify on the vendor's own page before a decision. Redis is marked partial under "open source" because version 8 ships a tri-license (RSALv2/SSPLv1/AGPLv3) rather than a single permissive license.

This comparison is published as an open dataset (CC-BY) with a permanent DOI: DOI 10.5281/zenodo.20738950. Browse the full dataset landing page or download the machine-readable JSON.

The self-hosted workhorses: Qdrant, Weaviate, Milvus, Vespa, Marqo

This is where most serious RAG teams land, because self-hosting gives the lowest unit cost, full data residency, and no per-vector vendor markup, at the price of running the server yourself. Qdrant is the performance-and-value pick: written in Rust, Apache-2.0, with the fastest filtered-search numbers in the category and the cheapest self-host footprint (it runs millions of vectors on a small VPS). It is the safe default when you want an open-source dedicated database you can also buy as a managed cloud later. Weaviate is the choice when hybrid search and a batteries-included feature set matter most: BSD-3-licensed, native BM25 hybrid, a module ecosystem for embeddings and rerankers, and Weaviate Cloud's entry tier is among the cheapest managed options. Milvus (with Zilliz as its managed cloud) is the billion-scale answer: a distributed, Apache-2.0 architecture built for hundreds of millions to billions of vectors that no single-node option matches, with the operational weight that implies.

Vespa is the heavyweight for teams whose problem is really search-and-ranking, not just vector lookup: an Apache-2.0 engine (out of Yahoo) that unifies vector, text, and structured data with rich, ML-driven ranking, powerful and correspondingly complex to operate. Marqo is the end-to-end option that folds embedding generation and storage into one Apache-2.0 system, which removes a moving part for teams that do not want to run a separate embedding pipeline. The honest split: pick Qdrant for fast filtered search at low cost, Weaviate for hybrid and modules, Milvus for raw scale, Vespa when ranking is the hard part, and Marqo when you want the embedding step handled for you.

The managed and serverless end: Pinecone and turbopuffer

If you would rather pay to make the database someone else's problem, this is the end of the spectrum, and the trade is control and portability for zero operations. Pinecone is the category's default fully-managed serverless option: proprietary, no servers to run, sparse-dense hybrid, and a serverless model that separates storage from compute so idle indexes cost little. It is the right pick for a team that wants to ship RAG without hiring anyone to run infrastructure, accepting closed-source lock-in and per-usage pricing as the cost. turbopuffer is the newer, cost-shaped challenger: an object-storage-first serverless engine that, by its own account, stores vectors on S3-class storage rather than RAM to cut cost dramatically for storage-heavy and many-namespace workloads. It is proprietary with no free tier and a monthly minimum, and it shines specifically when you have a lot of vectors that are queried unevenly. Choose Pinecone for the mature zero-ops default; choose turbopuffer when storage economics dominate your bill and you can live on a managed, closed platform.

The embedded and already-have-it options: Chroma, LanceDB, pgvector, Redis

For prototypes, local-first apps, and teams that should not add a database at all, the best answer often runs where you already are. Chroma is the prototyping default: Apache-2.0, embedded, dead-simple to start in a notebook, with a managed Chroma Cloud when you outgrow local. LanceDB is the embedded option for multimodal and edge: Apache-2.0, built on the Lance columnar format, runs in-process or in your cloud, and handles vectors alongside the raw data on object storage, which suits image and video RAG and offline apps.

The two "already-have-it" options are the ones most lists undersell. pgvector turns Postgres into a vector database with an extension under the PostgreSQL License: if your app data is already in Postgres, you get vector search with transactional consistency, mature SQL filtering, and zero new infrastructure, which is why it is the right first move for the majority of apps under roughly ten million vectors. Redis (via its Query Engine) adds vector search to a store many teams already run for caching, with very low latency, though note the 2025 license change: Redis 8 ships a tri-license (RSALv2, SSPLv1, or AGPLv3 at your option), so it is no longer plain-permissive open source and that may matter for some commercial deployments. The pattern in both cases is the data-gravity rule made concrete: adding vectors to a system you already operate beats standing up a dedicated database you do not yet need.

Which database for your situation

Match the move to where you actually are, not to the leaderboard. The shortlist falls out of the shape and your data gravity:

Already on Postgres

An app with its data in Postgres, under ~10M vectors.

pgvector in your existing database. No new system; SQL filtering; revisit only when you outgrow it.

Prototyping / solo

Notebook or local-first app, want to ship fast.

Chroma embedded (or LanceDB for multimodal). Move to a managed tier when you scale.

Self-hosting at scale

Real traffic, data-residency needs, an ops team.

Qdrant for fast filtered search at low cost; Weaviate if hybrid + modules matter; Milvus past ~100M vectors.

Zero-ops, no infra hires

Want RAG in production without running servers.

Pinecone serverless. turbopuffer if your bill is dominated by storage-heavy, uneven query load.

Search is the hard part

Complex ranking over text, vectors, and structured data.

Vespa for ML-driven ranking; Marqo if you also want embedding generation handled.

Already on Redis

Redis is in your stack for cache or queues.

Redis Query Engine for low-latency vectors, after checking the tri-license fits your deployment.

The honest verdicts

Methodology and conflict disclosure

How this comparison was built
Sample
11 vector databases spanning four deployment shapes, selected for RAG relevance and category coverage, not for who pays.
Criteria
Deployment shape, open-source license, native hybrid search, metadata filtering, and pricing model. Each cell is read from the vendor's own repository, documentation, or pricing page.
Hybrid / filtering
Marked "yes" only where a single native query path is documented; "partial" denotes availability through an integration or full-text add-on. Filtering is near-universal, so the matrix spends its discrimination on shape, license, and pricing.
Performance claims
Vendor benchmark and cost claims (for example object-storage savings) are attributed to the vendor, not asserted here as independent results. No first-party recall or latency benchmark is claimed.
Conflicts
The shape model and the rankings were fixed before any monetization check. Nesyona has no paid placement, no sponsorship from any database listed, and no affiliate relationship that altered the order. An outbound link may be tagged where a public program exists; it does not change a placement.
Last verified
June 2026. This category reprices and relicenses fast (see Redis); verify any license or pricing detail on the vendor's own page before deciding.
Compiled by
Vincent Wesley Couey, against public repositories, documentation, and pricing pages.

If you build one of these databases and want to check that we have represented it fairly, or that a license or pricing detail has moved, we would genuinely welcome the correction. The goal of this page is to be the one map of the category that is not written by a database ranking itself first.

This is the retrieval layer that feeds the rest of your AI stack. Once your vector store is chosen, the next decisions live one layer up: the LLMOps stack (gateway, observability, evaluation, and guardrails around the model), the best AI agent frameworks that orchestrate retrieval, and the best AI coding assistants for the team building it.

Frequently asked questions

What is the best vector database for RAG in 2026?
There is no single best one; the right pick is set by deployment shape and where your data already lives. If your application data is in Postgres and you are under roughly ten million vectors, pgvector is usually the highest-ROI choice because it adds vector search with no new infrastructure. For a dedicated open-source database you self-host, Qdrant is the best balance of speed, cost, and license (Rust, Apache-2.0); Weaviate is better when native hybrid search and modules matter; Milvus is the answer at billion scale. For zero-ops, Pinecone is the managed serverless default. Match the deployment shape to your data and ops capacity first, then compare within that shape.
Do I really need a dedicated vector database, or can I use pgvector?
For most applications under roughly ten million vectors, pgvector is enough and often the better choice. It turns your existing Postgres into a vector store with transactional consistency, mature SQL filtering, and zero new systems to run, secure, and pay for. You should graduate to a dedicated vector database when you genuinely outgrow that gravity well: tens of millions or billions of vectors, very heavy hybrid-search workloads, or latency and scale targets your primary database cannot meet. The mistake is reaching for a specialized database before the scale demands it, which adds an entire system you then have to keep in sync.
What is hybrid search, and does every vector database support it?
Hybrid search combines dense vector similarity with sparse keyword relevance (usually BM25), which improves retrieval quality when exact terms, names, or codes matter, common in real RAG. As of 2026 it is effectively standard: Weaviate has native BM25 hybrid, Qdrant and Milvus do sparse-plus-dense, Pinecone has long supported sparse-dense, and turbopuffer and Vespa treat text and vector as one query. Some embedded options (Chroma, LanceDB) and pgvector offer it through an integration or full-text add-on rather than a single native call, which the comparison marks as "partial." Because hybrid is now near-universal, it rarely decides the pick on its own.
Qdrant vs Weaviate vs Pinecone: how do I choose?
They sit in different deployment shapes, which is the cleanest way to choose. Qdrant is the self-hosted performance-and-value pick: Apache-2.0, Rust, fastest filtered search, cheapest to run yourself, with a managed cloud option. Weaviate is also self-hostable (BSD-3) but leans into native hybrid search and a module ecosystem, with an inexpensive managed cloud entry tier. Pinecone is the proprietary, fully-managed serverless option for teams that want zero operations and will accept closed-source lock-in and usage pricing. Decide whether you want to run the server (Qdrant or Weaviate) or not (Pinecone) first; the rest follows.
Which vector databases are open source, and what changed with Redis?
Most of the dedicated engines are open source: Qdrant, Milvus, Chroma, LanceDB, Vespa, and Marqo are Apache-2.0, and Weaviate is BSD-3-Clause. pgvector ships under the permissive PostgreSQL License. The exceptions are Pinecone and turbopuffer, which are proprietary managed services. Redis is the one to watch: with version 8 it adopted a tri-license (RSALv2, SSPLv1, or AGPLv3 at your option), so its vector capability is "source-available with an AGPL option" rather than plain permissive open source, which can matter for some commercial or SaaS deployments. Always confirm the current license on the project's own repository before committing.
What is an embedded vector database and when should I use one?
An embedded vector database runs inside your application process, like SQLite does for relational data, with no separate server or network hop. Chroma and LanceDB are the main options: Chroma for fast prototyping and notebooks, LanceDB for multimodal and edge or offline apps. Use one when you are prototyping, building a local-first or on-device app, or keeping a small single-node workload simple. The ceiling is single-node scale, so plan to move to a managed or self-hosted tier (both Chroma and LanceDB offer cloud versions) if the workload grows beyond one machine.

The bottom line

A vector database is not a leaderboard you win on recall; it is a deployment decision you match to your data. Hybrid search and filtering are standard now, so the engines are closer than the benchmarks suggest, and the real question is which of the four shapes (embedded, self-hosted, managed, serverless) fits your data gravity and ops capacity. Start where your data already lives: pgvector if you are on Postgres, Redis Query Engine if you run Redis, an embedded Chroma or LanceDB for prototypes. Reach for a dedicated database when you outgrow that, and pick by shape: Qdrant for fast self-hosted value, Weaviate for hybrid, Milvus for billion-scale, Pinecone or turbopuffer for zero-ops. The cheapest correct answer is almost always the system you do not have to add.

Sources

  1. pgvector repository and license, github.com/pgvector/pgvector (PostgreSQL License, accessed June 2026).
  2. Qdrant repository and pricing, github.com/qdrant/qdrant and qdrant.tech/pricing (Apache-2.0, usage plus perpetual free tier).
  3. Weaviate repository and documentation, github.com/weaviate/weaviate and weaviate.io (BSD-3-Clause, native BM25 hybrid search).
  4. Milvus and Zilliz documentation, milvus.io and zilliz.com (Apache-2.0, distributed billion-scale, sparse-dense hybrid from 2.4).
  5. Pinecone documentation and pricing, pinecone.io (serverless, storage-compute separation, sparse-dense hybrid).
  6. Chroma and LanceDB repositories, github.com/chroma-core/chroma and github.com/lancedb/lancedb (Apache-2.0, embedded).
  7. turbopuffer documentation, turbopuffer.com (object-storage-first serverless; cost-savings figures per turbopuffer's own statements).
  8. Vespa repository and documentation, github.com/vespa-engine/vespa and vespa.ai (Apache-2.0, unified ranking).
  9. Marqo documentation, marqo.ai (Apache-2.0, end-to-end embedding and storage).
  10. Redis repository and licensing notice, github.com/redis/redis (Redis 8 tri-license RSALv2/SSPLv1/AGPLv3) and Redis Query Engine vector documentation.
Featured in this analysis? Grab a free badge for your site → nesyona.com/badges
Save
Dashboard

From our network

Best AI Tools for Amazon Sellers - bagengine.comBest AI Courses 2026 - edubracket.comBest Accounting Software for Online Sellers - ceocult.com