Skip to main content

Documentation Index

Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

Patterns surfaced by Topics are most useful when you act on them. This guide covers three workflows: building evaluation datasets from classified logs, scoring logs based on topic classifications, and assigning logs for human review.

Build datasets from topics

Filter logs by topic to build targeted evaluation datasets.
  1. Go to Logs and click Filter.
  2. Select Classifications and choose the classification you want to filter by. Alternately, click SQL and enter a filter clause. See the SQL reference for more query patterns.
    classifications.Task.label = "Dataset creation"
    
  3. Select the logs you want to include.
  4. Click + Dataset and choose an existing dataset or create a new one.
Common use cases:
  • “Error investigation” tasks → test your error handling.
  • Negative sentiment interactions → improve responses.
  • “Pricing questions” → evaluate your pricing explanations.
See Build datasets for more on working with datasets.

Score logs based on topics

Create scorers that flag logs with negative sentiment, penalize specific issue types, or alert when certain topics appear together. Example scorer that flags negative checkout experiences:
topic_scorer.ts
import braintrust from "braintrust";
import { z } from "zod";

const project = braintrust.projects.create({ name: "my-project" });

project.scorers.create({
  name: "Checkout experience",
  slug: "checkout-experience",
  description: "Flag traces with negative checkout experiences",
  parameters: z.object({
    trace: z.any(),
  }),
  handler: async ({ trace }) => {
    if (!trace) return { score: null };

    const spans = await trace.getSpans();
    const rootSpan = spans.find((s) => s.span_id === s.root_span_id);
    if (!rootSpan) return { score: null };

    const classifications = rootSpan.classifications || {};
    const taskClassification = (classifications.Task || [{}])[0];
    const sentimentClassification = (classifications.Sentiment || [{}])[0];

    if (
      taskClassification.label === "Checkout Flow" &&
      sentimentClassification.label === "NEGATIVE"
    ) {
      return {
        score: 0,
        metadata: { reason: "Negative sentiment during checkout" },
      };
    }

    return { score: 1 };
  },
});
Save the code to a file and push it:
bt functions push topic_scorer.ts
Then configure the automation:
  1. Go to Settings > Automations and click + Rule.
  2. Select your scorer, set Scope to Trace, configure the sampling rate, and click Create rule.
See Score online and Trace-level scorers for more details.

Assign topics for review

Assign logs matching specific topics for human review.
  1. Go to Logs and click Filter.
  2. Select Classifications and choose the classification you want to filter by. Alternately, click SQL and enter a filter clause. See the SQL reference for more query patterns.
    classifications.Task.label = "Dataset creation"
    
  3. Select the logs you want to assign.
  4. Select Assign and choose a team member.
Team members receive email notifications when rows are assigned to them.
See Add human feedback for more on human review.

Next steps