Topics - Braintrust

Topics automatically analyze and classify your logs to surface the failures, edge cases, and recurring problems hiding across your traces. Use Topics for:

Blind-spot detection: Surface user request patterns and system gaps you didn’t know to look for.
Silent failure detection: Catch quality issues that don’t trip explicit checks.
Product roadmap signals: Cluster real user requests into themes to inform what to build.
Targeted evaluation datasets: Filter classified logs to build datasets for focused evals.

For self-hosted deployments, Topics requires data plane v2.x or later and access granted by your account team. See the v2.x upgrade guide for setup and eligibility requirements.

How it works

Topics runs a daily pipeline on your logs:

Preprocessing — Each trace is formatted into readable text. Messages, tool calls, and nested spans become a narrative.
Facets — For each facet (Task, Sentiment, Issues), an LLM analyzes the preprocessed trace and extracts a short summary describing the trace through that lens.
Topics — Once at least 100 facet summaries are collected, a clustering algorithm groups similar summaries into topics. For example, “User wants a refund,” “Requesting a chargeback,” and “Asking for money back” might all become the topic “Refund requests.”
Classification — For each facet, the trace is matched to its closest topic. These classifications appear in your logs table, where you can filter, query with SQL, and build evaluation datasets.

The pipeline runs on a set cadence:

Initially: Existing logs are optionally backfilled with facet summaries.
Continuously: New logs are processed as they arrive.
Daily: Topics are regenerated from collected facet summaries. Generation requires at least 100 summaries.

Group traces into conversations

Topics classifies one trace at a time by default. When a session or conversation spans multiple traces (for example, one trace per turn with auto-instrumentation), classifications can fragment across the conversation. Configure a grouping key so Topics classifies the group as a unit and sentiment, task, and issues reflect the full interaction. Topics ships with three built-in facets that classify your logs automatically:

Task: Extracts the user’s intent or goal (e.g., “Creating a dataset,” “Debugging an API error”).
Sentiment: Extracts the user’s emotional tone (e.g., “POSITIVE,” “FRUSTRATED,” “NEUTRAL”).
Issues: Identifies problems with agent behavior or responses (e.g., “Tool call failed,” “Incomplete answer”).

Custom facets extend this set with domain-specific patterns.

Models and data handling

Topics runs three model calls during the daily pipeline. All three are served by Braintrust, not by third-party LLM providers:

Facet summarization: An LLM reads each preprocessed trace and writes a short summary through the lens of each facet (Task, Sentiment, Issues, or your custom facet). Choose the model in the automation’s Advanced settings.
Embedding: An embedding model turns each summary into a vector so similar summaries can be grouped together.
Cluster naming: An LLM reads a sample of summaries from each cluster and picks a short, human-readable name like “Refund requests.”

The models used are brain-facet-* for facet summarization, brain-embedding-* for embedding, and brain-agent-* for cluster naming. Braintrust serves them on Baseten, which is included in the Braintrust DPA as a subprocessor. Self-hosted deployments call the same Braintrust-hosted endpoints with Zero Data Retention. See Enable Topics for self-hosting requirements. Topics requires built-in models. If your organization has disabled built-in models, topic generation will fail. Self-hosted organizations have built-in models disabled by default and must enable them before using Topics.

Next steps

Enable Topics by checking that your traces work with the Thread preprocessor and turning on the daily pipeline.
Review insights by viewing topic distributions, examining traces, clustering filtered subsets, and tracking trends.
Act on findings by building evaluation datasets, scoring logs by classification, and assigning topics for human review.
Manage Topics by checking pipeline status, re-generating topics, adjusting sampling, and rewinding history.
Create custom facets for domain-specific patterns beyond the built-in Task, Sentiment, and Issues.

​How it works

​Group traces into conversations

​Built-in facets

​Models and data handling

​Next steps

How it works

Group traces into conversations

Built-in facets

Models and data handling

Next steps