Query token distribution by user or prompt with SQL

Applies to:

Plan -
Deployment -

Summary

Goal: Query per-user and per-prompt token distribution across traces using the two-level GROUP BY pattern. Features: project_logs(), any_value(), estimated_cost(), shape => 'traces', timestamp filters.

Configuration steps

Step 1: Understand span data layout

Token metrics (metrics.prompt_tokens, metrics.completion_tokens) live on LLM-type spans. User and prompt metadata (e.g., metadata.user_id, metadata.model) may live on the root span. A single span row cannot see both. The two-level query below bridges them.

Step 2: Write the inner query (group by trace)

Group by root_span_id to collapse all spans in a trace. Use any_value() to surface root-span metadata fields within that group.

SELECT
  any_value(metadata.user_id)   AS user,
  any_value(metadata.model)     AS model,
  SUM(metrics.prompt_tokens)    AS prompt_tokens,
  SUM(metrics.completion_tokens) AS completion_tokens,
  SUM(estimated_cost())         AS total_cost
FROM project_logs('<project-id>', shape => 'traces')
WHERE created >= NOW() - INTERVAL 7 DAY
GROUP BY root_span_id

Step 3: Wrap with outer query (group by user or model)

SELECT
  user,
  model,
  SUM(prompt_tokens)    AS total_prompt_tokens,
  SUM(completion_tokens) AS total_completion_tokens,
  SUM(total_cost)       AS total_cost
FROM (
  SELECT
    any_value(metadata.user_id)    AS user,
    any_value(metadata.model)      AS model,
    SUM(metrics.prompt_tokens)     AS prompt_tokens,
    SUM(metrics.completion_tokens) AS completion_tokens,
    SUM(estimated_cost())          AS total_cost
  FROM project_logs('<project-id>', shape => 'traces')
  WHERE created >= NOW() - INTERVAL 7 DAY
  GROUP BY root_span_id
) subq
GROUP BY user, model
ORDER BY total_cost DESC;

Step 4: Use the correct `shape` parameter

Shape	Use case
(default)	One row per span
`shape => 'traces'`	All spans returned when any span matches a filter; enables inner GROUP BY on `root_span_id`
`shape => 'summary'`	Pre-aggregated trace-level metrics; useful for rollups without a subquery

Step 5: Use `estimated_cost()` not `metrics.estimated_cost`

estimated_cost() is a function that computes cost from token counts and the model registry. Use it in span-level queries. Aggregating the pre-computed metrics.estimated_cost field at a second level of grouping is a known bug.

SUM(estimated_cost()) AS total_cost  -- correct
SUM(metrics.estimated_cost)          -- may produce incorrect results in outer GROUP BY

Step 6: Add a timestamp filter to avoid timeouts

Queries without a created filter scan the full table and will time out on large projects.

WHERE created >= NOW() - INTERVAL 1 DAY  -- start narrow, expand as needed

Start with 1 day and widen the window once the query shape is confirmed.

Step 7: Work around cross-span JOIN limitations

JOIN, UNION, and subqueries across different project_logs() calls are unsupported. Use these alternatives instead:

shape => 'traces' — returns all spans in a trace when any span matches; use any_value() to pull fields from other span types
shape => 'summary' — returns one pre-aggregated row per trace with total token counts and cost

Step 8: Handle Loop query generation limitations

Loop does not know about any_value() or the two-level GROUP BY pattern. For nuanced schemas:

Run Loop with a 1-hour window to generate a starting query.
Paste the output into the SQL sandbox.
Edit by hand — add any_value(), the inner/outer GROUP BY, and a tight created filter.

Loop output is a starting point and should be verified against known data.

All troubleshooting guides

Documentation Index

​Summary

​Configuration steps

​Step 1: Understand span data layout

​Step 2: Write the inner query (group by trace)

​Step 3: Wrap with outer query (group by user or model)

​Step 4: Use the correct shape parameter

​Step 5: Use estimated_cost() not metrics.estimated_cost

​Step 6: Add a timestamp filter to avoid timeouts

​Step 7: Work around cross-span JOIN limitations

​Step 8: Handle Loop query generation limitations