-
Notifications
You must be signed in to change notification settings - Fork 489
Description
Describe the goal of the feature
This issue covers two related gaps in LLMObs output serialization:
1. Pydantic model outputs render as repr() instead of JSON (@workflow, @task, and others) When a decorated function returns a tuple or list containing Pydantic v2 BaseModel instances, safe_json() handles the top-level object but not nested models — causing _unserializable_default_repr to fall back to str(), producing unreadable repr() output in traces.
2. @llm decorated functions produce no output in traces at all Unlike @workflow and @task, the @llm decorator does not call LLMObs.annotate() on the return value — outputs are silently dropped and never appear in traces regardless of the return type.
Notes
- PR fix(llmobs): serialize inputs and outputs to valid json #12416 is related to item 1 above — this request is a natural follow-on targeting the
default=handler and@llmoutput path it left unaddressed. - Addressing both together would bring the full decorator suite to a consistent level of output observability.
Is your feature request related to a problem?
Yes. Our organization is trying to leverage LLM traces in datadog. And unfortunately without readability of trace inputs and outputs on large payloads, the debugging/triaging becomes extremely difficult. So wanted to reach out for assistance.
Describe alternatives you've considered
We considered manual LLMObs.annotate calls instead of decorators.. But it is hard to enforce its usage and also defeats the purpose of decorators at first place.
Additional context
Affected Versions
ddtrace >= 3.0with LLMObs enabled (verified against4.x)
Steps to Reproduce
from pydantic import BaseModel
from ddtrace.llmobs.decorators import workflow
from enum import Enum
class RoastLevel(str, Enum):
LIGHT = "light"
MEDIUM = "medium"
DARK = "dark"
class CoffeeRating(BaseModel):
bean_origin: str
roast: RoastLevel
flavor_notes: list[str]
score: float
class BrewSummary(BaseModel):
method: str
brew_time_seconds: int
notes: str
@workflow
async def evaluate_coffee_batch(inputs) -> tuple[list[CoffeeRating], list[BrewSummary]]:
ratings = [
CoffeeRating(bean_origin="Ethiopia", roast=RoastLevel.LIGHT, flavor_notes=["blueberry", "jasmine", "citrus"], score=9.2),
CoffeeRating(bean_origin="Colombia", roast=RoastLevel.MEDIUM, flavor_notes=["caramel", "walnut", "dark chocolate"], score=8.7),
CoffeeRating(bean_origin="Guatemala", roast=RoastLevel.DARK, flavor_notes=["smoky", "molasses", "cedar", "dark fruit"], score=7.1),
]
summaries = [
BrewSummary(method="pour-over", brew_time_seconds=240, notes="bright and clean"),
BrewSummary(method="french-press", brew_time_seconds=480, notes="full body, low acidity"),
BrewSummary(method="espresso", brew_time_seconds=30, notes="rich crema, intense"),
]
return ratings, summariesActual trace output — hard to read
([CoffeeRating(bean_origin='Ethiopia', roast=<RoastLevel.LIGHT: 'light'>, flavor_notes=['blueberry',
'jasmine', 'citrus'], score=9.2), CoffeeRating(bean_origin='Colombia', roast=<RoastLevel.MEDIUM:
'medium'>, flavor_notes=['caramel', 'walnut', 'dark chocolate'], score=8.7), CoffeeRating(bean_origin=
'Guatemala', roast=<RoastLevel.DARK: 'dark'>, flavor_notes=['smoky', 'molasses', 'cedar', 'dark fruit',
'bittersweet cocoa'], score=7.1), CoffeeRating(bean_origin='Kenya', roast=<RoastLevel.LIGHT: 'light'>,
flavor_notes=['blackcurrant', 'tomato', 'grapefruit zest', 'raw sugar'], score=8.9), ...],
[BrewSummary(method='pour-over', brew_time_seconds=240, notes='bright and clean'), ...])
Expected output:
[
[
{"bean_origin": "Ethiopia", "roast": "light", "flavor_notes": ["blueberry", "jasmine", "citrus"], "score": 9.2},
{"bean_origin": "Colombia", "roast": "medium", "flavor_notes": ["caramel", "walnut", "dark chocolate"], "score": 8.7},
{"bean_origin": "Guatemala", "roast": "dark", "flavor_notes": ["smoky", "molasses", "cedar", "dark fruit"], "score": 7.1}
],
[
{"method": "pour-over", "brew_time_seconds": 240, "notes": "bright and clean"},
{"method": "french-press", "brew_time_seconds": 480, "notes": "full body, low acidity"},
{"method": "espresso", "brew_time_seconds": 30, "notes": "rich crema, intense"}
]
]Potential Root Causes
- For
@llmdecorator, it never got the serialization capabilities in PR fix(llmobs): serialize inputs and outputs to valid json #12416 ... So we would need serialization added - For all the other decorators, the issue with Pydantic models is:
safe_json() checks model_dump on the top-level object only. When the return type is tuple[list[BaseModel], ...], json.dumps recurses into the list items, hits a nested Pydantic model, and delegates to the default= handler — _unserializable_default_repr
(ddtrace/llmobs/_utils.py, lines 287–291):
def _unserializable_default_repr(obj):
try:
return str(obj) # ← Pydantic model lands here → repr() in trace
except Exception:
log.warning("I/O object is neither JSON serializable nor string-able. Defaulting to placeholder value instead.")
return "[Unserializable object: {}]".format(repr(obj))Potential Approach/Fix (untested)
def _unserializable_default_repr(obj):
try:
if hasattr(obj, "model_dump") and callable(obj.model_dump):
# Pydantic v2 — fully JSON-native, no further default= calls needed :
# https://docs.pydantic.dev/latest/api/base_model/#pydantic.BaseModel.model_dump
return obj.model_dump(mode="json")
if hasattr(obj, "dict") and callable(obj.dict):
return obj.dict() # Pydantic v1 — no mode="json" equivalent
except Exception:
pass # model_dump failed, fall through to str()
try:
return str(obj)
except Exception:
log.warning("I/O object is neither JSON serializable nor string-able. Defaulting to placeholder value instead.")
return "[Unserializable object: {}]".format(repr(obj))