Add blog post on OpenSearch Agent Health for AI observability#4088
Conversation
Signed-off-by: Megha Goyal <goyamegh@amazon.com>
|
Thank you for submitting a blog post! The blog post review process is: Submit a PR -> (Optional) Peer review -> Doc review -> Marketing review -> Published. |
|
Hi @goyamegh, It looks like you're adding a new blog post but don't have an issue mentioned. Please link this PR to an open issue using one of these keywords in the PR description:
If an issue hasn't been created yet, please create one and then link it to this PR. |
|
@kolchfa-aws can you help review this ? |
Signed-off-by: Fanit Kolchina <kolchfa@amazon.com>
Signed-off-by: Rekha Thottan <rjthotan@amazon.com>
74d6927 to
2d5cad4
Compare
Signed-off-by: Rekha Thottan <rjthotan@amazon.com>
|
@kolchfa-aws, we have made changes to the Built for Developer Workflows and What's Next section. Could you please review these changes? |
Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com>
|
@Thottan Done |
|
|
||
| **3. Solving the evaluation paradox: Real-time agent evaluation**. | ||
|
|
||
| Agent Health uses the _golden path_ trajectory comparison, in which an LLM judge scores agent actions against expected outcomes. You define what _good_ looks like for your agent (the expected steps, tool calls, and outcomes) and Agent Health measures _how well_ your agent performs against these criteria. Using your preferred LLM provider as your judge gives you flexibility to choose the evaluation model that fits your needs and budget. |
There was a problem hiding this comment.
Suggest rewrite to explain "golden path":
Agent Health uses the golden path trajectory comparison to evaluate agent performance. In this approach, you define the ideal sequence of steps, tool calls, and outcomes your agent should follow as the golden path. An LLM judge then scores your agent's actual behavior against that expected trajectory, flagging deviations that indicate errors or regressions. Agent Health measures how well your agent performs against these criteria, and using your preferred LLM provider as your judge gives you the flexibility to choose the evaluation model that fits your needs and budget.
| - community | ||
| meta_keywords: AI agents, agent observability, OpenTelemetry, LLM evaluation, agent tracing, AI agent testing, OpenSearch, agentic AI | ||
| meta_description: OpenSearch Agent Health provides open-source observability and evaluation for AI agents. Ship production-ready agents faster with real-time tracing, systematic benchmarking, and LLM-based evaluation. | ||
| --- |
There was a problem hiding this comment.
meta_keywords: OpenSearch Agent Health, AI agents, observability, LLM evaluation, AI agent testing, OpenTelemetry, agentic AI, trace visualization, agent debugging, AI agent observability, LLM agent evaluation, agentic AI debugging, automated LLM benchmarking tool, open-source LLM observability
meta_description: Discover why AI agents fail in silence and how OpenSearch Agent Health solves it with open-source trace observability, automated benchmarking, and LLM judge evaluation.
…nation Signed-off-by: Rekha Thottan <rjthotan@amazon.com>
|
@kolchfa-aws - Please final merge and close. The blog is published here: https://opensearch.org/blog/opensearch-agent-health-open-source-observability-and-evaluation-for-ai-agents/ |
Summary
Adds blog post announcing the experimental launch of OpenSearch Agent Health — an open-source evaluation and observability framework for AI agents, available as a zero-install NPX tool.
Resolves #4085
Changes
_posts/2026-02-28-opensearch-agent-health.mdevaluation paradox) and how Agent Health addresses each
connecting your own agent, creating benchmarks, running evaluations, and
iterating on results
_community_members/goyamegh.md,_community_members/thottan.mdassets/media/community/members/goyamegh.jpg,thottan.jpgscreenshot
Checklist
layout: post,authors,date,categories)_community_members/withauthorpersona
assets/media/community/members/assets/media/blog-images/2026-02-28-opensearch-agent-health//assets/...)Stack)
meta_keywords/meta_description— to be filled by marketing teamAuthors
@goyamegh @Thottan
By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.