The comparison matrix
Nineteen tools across the four layers, on the five dimensions that decide interoperability and cost. yes and no are read from each vendor's own documentation; partial denotes tier-limited, proxy-based, ingest-only, or unconfirmed-native. Pricing model is the column to read at scale, not the headline price. The full machine-readable matrix is in data.json.
| Tool | Layer | OTel-native | Self-host | Open source | Pricing model | Eval / Guardrails built in |
|---|---|---|---|---|---|---|
| Portkey | Gateway +obs +guardrails | yes | yes | Apache-2.0 core | usage (per log) | guardrails + observability |
| LiteLLM | Gateway | yes | yes | OSS + Enterprise | free OSS / Enterprise quote | routing only |
| Cloudflare AI Gateway | Gateway | partial | no (cloud only) | no | free (with Workers) | no |
| Kong AI Gateway | Gateway | yes | yes | core OSS, AI plugins paid | Enterprise license | policy-level |
| OpenRouter | Gateway (aggregator) | partial | no (cloud only) | no | flat 5.5% on credits | no |
| AIMLAPI | Gateway (aggregator) | partial | no (cloud only) | no | usage (pay-as-you-go) | no |
| Langfuse | Observability +eval | yes (v3) | yes | MIT | usage (per unit) | eval + prompt management |
| LangSmith | Observability +eval | partial (late) | partial (enterprise) | no | seat + per-trace | eval |
| Helicone | Observability +gateway | partial | yes | OSS | usage (per request) | light eval |
| Arize Phoenix | Observability +eval | yes (OpenInference) | yes | OSS | free OSS / usage | eval |
| Traceloop / OpenLLMetry | Observability | yes (reference) | yes (library) | Apache-2.0 | free library / platform | platform only |
| Datadog LLM Observability | Observability | yes (GenAI convention) | no (cloud only) | no | usage (per span) | eval |
| Maxim | Evaluation +obs +gateway | yes (Bifrost) | partial (enterprise) | Bifrost Apache-2.0 | usage / Enterprise | observability + guardrails |
| Braintrust | Evaluation +obs | partial (ingests OTLP) | partial (enterprise) | no | usage / Enterprise | observability |
| Promptfoo | Evaluation | no (not the focus) | yes (local-first) | OSS | free OSS / Enterprise | red-team |
| DeepEval / Confident AI | Evaluation +obs | yes | partial (OSS library only) | DeepEval OSS | usage (cloud) | observability (cloud) |
| Prediction Guard | Guardrails +inference | yes (events) | no (hosted) | no | usage / Enterprise | eval-style checks |
| Guardrails AI | Guardrails | yes (telemetry) | yes | Apache-2.0 | free OSS / Pro | validation |
| NeMo Guardrails | Guardrails | no | yes | Apache-2.0 | free OSS | rails only |
Pricing models and OTel status reflect each vendor's public documentation as of June 2026 and change often; verify on the vendor's own page before a purchase decision. The narrative companion to this dataset is The LLMOps Stack 2026.
Methodology
Each cell is read from the vendor's own public documentation, GitHub repository, or pricing page as of June 2026. OpenTelemetry support is marked yes (native) only where confirmed from the tool's own docs; partial denotes tier-limited, proxy-based, ingest-only, or unconfirmed-native. Vendor performance claims (for example latency benchmarks) are not encoded here.
The dataset tracks five dimensions: otel_native, self_host, open_source, pricing_model, and eval_guardrails_built_in, plus each tool's primary layer and the layers it spans. Rankings and the four-layer reference model were fixed before any monetization check; there is no paid placement. Defunct (HumanLoop) and acquired (Protect AI) products are excluded from the live comparison.