Applies to:
- Plan:
- Deployment:
Summary
Issue: When usingsetup_pydantic_ai(), Braintrust incorrectly marks internal wrapper spans as type “llm”, causing a single API call to appear as 4 separate LLM calls in metrics and dashboards.
Cause: PydanticAI creates nested spans for streaming, agent execution, fallback handling, and the actual API call - all are marked as type “llm” instead of distinguishing wrapper spans from actual API calls.
Resolution: Filter queries to count only actual LLM API calls by model name patterns, excluding internal wrapper spans.
Resolution Steps
Step 1: Filter queries for accurate LLM counts
When querying LLM metrics in dashboards or reports, filter to count only actual API calls by model name patterns.Step 2: Apply filters to cost calculations
Use the same filtering pattern when calculating costs or token usage to ensure accurate metrics based on actual API calls rather than wrapper spans.Step 3: Update dashboard queries
Modify existing dashboard queries and alerts to use the filtered approach to prevent inflated LLM call counts in monitoring and reporting.Preset monitor charts (Spans, Latency, Total LLM cost, Token count, and Time to first token) automatically exclude internal scorer spans. The manual filtering approach above is needed for custom charts, which include all spans by default.