Key findings

LLMOps is a four-layer stack: Gateway/Routing, Observability/Tracing, Evaluation/Testing, Guardrails/Safety, ordered along the request path.
OpenTelemetry is the portability line: OTel-native tools let you swap backends; proprietary-only SDKs trap your instrumentation. Only 9 of 19 tools are confirmed OTel-native from their own docs.
The layers leak by design: the strongest tools span two or three layers, so buying by layer and checking the overlap beats buying by brand.
Pricing model matters more than price: per-event, per-seat, usage, and flat models scale very differently; a cheap-in-demo per-trace tool can dominate the production bill.
Self-hostable open-source spine exists: Langfuse, Arize Phoenix, LiteLLM, Promptfoo, Portkey core, Guardrails AI, NeMo Guardrails, and Traceloop are all OSS and self-hostable.

The comparison matrix

Nineteen tools across the four layers, on the five dimensions that decide interoperability and cost. yes and no are read from each vendor's own documentation; partial denotes tier-limited, proxy-based, ingest-only, or unconfirmed-native. Pricing model is the column to read at scale, not the headline price. The full machine-readable matrix is in data.json.

Tool	Layer	OTel-native	Self-host	Open source	Pricing model	Eval / Guardrails built in
Portkey	Gateway +obs +guardrails	yes	yes	Apache-2.0 core	usage (per log)	guardrails + observability
LiteLLM	Gateway	yes	yes	OSS + Enterprise	free OSS / Enterprise quote	routing only
Cloudflare AI Gateway	Gateway	partial	no (cloud only)	no	free (with Workers)	no
Kong AI Gateway	Gateway	yes	yes	core OSS, AI plugins paid	Enterprise license	policy-level
OpenRouter	Gateway (aggregator)	partial	no (cloud only)	no	flat 5.5% on credits	no
AIMLAPI	Gateway (aggregator)	partial	no (cloud only)	no	usage (pay-as-you-go)	no
Langfuse	Observability +eval	yes (v3)	yes	MIT	usage (per unit)	eval + prompt management
LangSmith	Observability +eval	partial (late)	partial (enterprise)	no	seat + per-trace	eval
Helicone	Observability +gateway	partial	yes	OSS	usage (per request)	light eval
Arize Phoenix	Observability +eval	yes (OpenInference)	yes	OSS	free OSS / usage	eval
Traceloop / OpenLLMetry	Observability	yes (reference)	yes (library)	Apache-2.0	free library / platform	platform only
Datadog LLM Observability	Observability	yes (GenAI convention)	no (cloud only)	no	usage (per span)	eval
Maxim	Evaluation +obs +gateway	yes (Bifrost)	partial (enterprise)	Bifrost Apache-2.0	usage / Enterprise	observability + guardrails
Braintrust	Evaluation +obs	partial (ingests OTLP)	partial (enterprise)	no	usage / Enterprise	observability
Promptfoo	Evaluation	no (not the focus)	yes (local-first)	OSS	free OSS / Enterprise	red-team
DeepEval / Confident AI	Evaluation +obs	yes	partial (OSS library only)	DeepEval OSS	usage (cloud)	observability (cloud)
Prediction Guard	Guardrails +inference	yes (events)	no (hosted)	no	usage / Enterprise	eval-style checks
Guardrails AI	Guardrails	yes (telemetry)	yes	Apache-2.0	free OSS / Pro	validation
NeMo Guardrails	Guardrails	no	yes	Apache-2.0	free OSS	rails only

Pricing models and OTel status reflect each vendor's public documentation as of June 2026 and change often; verify on the vendor's own page before a purchase decision. The narrative companion to this dataset is The LLMOps Stack 2026.

Methodology

Each cell is read from the vendor's own public documentation, GitHub repository, or pricing page as of June 2026. OpenTelemetry support is marked yes (native) only where confirmed from the tool's own docs; partial denotes tier-limited, proxy-based, ingest-only, or unconfirmed-native. Vendor performance claims (for example latency benchmarks) are not encoded here.

The dataset tracks five dimensions: otel_native, self_host, open_source, pricing_model, and eval_guardrails_built_in, plus each tool's primary layer and the layers it spans. Rankings and the four-layer reference model were fixed before any monetization check; there is no paid placement. Defunct (HumanLoop) and acquired (Protect AI) products are excluded from the live comparison.

Open dataset. The full matrix is published at data.json under a CC-BY 4.0 license, free to share and adapt with attribution to Nesyona / Vincent Couey (ORCID).

The LLMOps Stack: OpenTelemetry, Self-Host and Pricing-Model Comparison 2026

Which production LLM-ops tools are OpenTelemetry-native, self-hostable, and open-source, and how do they price? A 19-tool, four-layer comparison.

The comparison matrix

Methodology