Nested LLM call counting discrepancy

Summary

Issue: When using setup_pydantic_ai(), Braintrust incorrectly marks internal wrapper spans as type “llm”, causing a single API call to appear as 4 separate LLM calls in metrics and dashboards.

Cause: PydanticAI creates nested spans for streaming, agent execution, fallback handling, and the actual API call - all are marked as type “llm” instead of distinguishing wrapper spans from actual API calls.

Resolution: Filter queries to count only actual LLM API calls by model name patterns, excluding internal wrapper spans.

Resolution Steps

Step 1: Filter queries for accurate LLM counts

When querying LLM metrics in dashboards or reports, filter to count only actual API calls by model name patterns.

span_attributes.type = 'llm' AND (
  span_attributes.name LIKE 'chat gpt%' OR
  span_attributes.name LIKE 'chat gemini%' OR
  span_attributes.name LIKE 'chat claude%'
)

Step 2: Apply filters to cost calculations

Use the same filtering pattern when calculating costs or token usage to ensure accurate metrics based on actual API calls rather than wrapper spans.

Step 3: Update dashboard queries

Modify existing dashboard queries and alerts to use the filtered approach to prevent inflated LLM call counts in monitoring and reporting.

Preset monitor charts (Spans, Latency, Total LLM cost, Token count, and Time to first token) automatically exclude internal scorer spans. The manual filtering approach above is needed for custom charts, which include all spans by default.

All troubleshooting guides

Documentation Index

​Summary

​Resolution Steps

​Step 1: Filter queries for accurate LLM counts

​Step 2: Apply filters to cost calculations

​Step 3: Update dashboard queries

Summary

Resolution Steps

Step 1: Filter queries for accurate LLM counts

Step 2: Apply filters to cost calculations

Step 3: Update dashboard queries