Glossary - Braintrust

This glossary defines key terms and concepts used in our product and documentation.

Alert

An automation that notifies you when a specific condition occurs on your logs in Braintrust. Automations guide

Automation

A configured workflow that lets you trigger actions based on specific events in Braintrust. For example, sending an alert, or batch exporting data. Automations guide

Benchmark

An evaluation designed to assess model performance across specific capabilities or against industry standards.

Brainstore

The high-performance data engine backing logs, search, and tables. Brainstore blog post

Classification

The topic label assigned to a trace by a facet, indicating which cluster the trace belongs to. Classifications appear as columns in the logs table and can be filtered or queried with SQL. How Topics work

Configuration

Project-level settings that define behavior for evals, experiments, and integrations. Project configuration guide

Dataset

A versioned collection of pairs of inputs and (optional) expected outputs. Datasets guide

Evaluation / Eval

An eval consists of a task, dataset, and one or more scorers or classifiers. Evaluations can be:

Offline: run a task on a static dataset with scoring functions.
Online: real-time scoring on production or test requests.

Evals guide

Experiment

An instance of an offline eval run. Scores a specific task run on a given dataset. Experiment guide A dimension of analysis applied to traces by the Topics pipeline (for example, Task, Sentiment, or Issues). Each facet uses a preprocessor and an LLM prompt to extract a short summary per trace. How Topics work

Human review

An option to route evaluations or tasks to human reviewers instead of, or in addition to, automated scorers. Human review guide

Inline parameters

Configurable options defined directly in your Eval() call using Zod, Pydantic, or plain dicts. Inline parameters become UI controls in the playground when running remote evals or sandboxes. To change the values, update them in your eval code.

Log

An instance of a live production or test interaction. Logs can include inputs, outputs, expected values, metadata, errors, scores, and tags. Scorers can also be applied to live logs to conduct online evaluations. Logs guide

Loop agent

Braintrust’s AI agent that can help you with evaluation-related tasks, like optimizing prompts and generating dataset rows. Loop guide

Metric

A quantitative measure of model performance (for example, accuracy, latency, or cost) tracked over time and across experiments.

Model

An AI system (typically an LLM) that can be evaluated or monitored with Braintrust. Models can be first-party, third-party, or open-source.

Organization

Your company or team “home” in Braintrust. It holds all your projects, members, and settings. Organizations reference

OTEL

OpenTelemetry: the instrumentation standard Braintrust uses to collect and export trace and span data from integrations. OpenTelemetry guide

Playground

An interactive space where you can prototype, iterate on, and compare multiple prompts and models against a dataset in real time. A playground can be saved as an experiment. Playgrounds guide

Preprocessor

A function that formats a trace into readable text before a facet’s LLM extracts a summary. The default Thread preprocessor renders traces as conversation narratives. Custom preprocessors can adapt non-conversational trace shapes. How Topics work

Project

A container for related experiments, datasets, and logs. Use projects to segment work by feature, environment (dev/prod), or team. Projects guide

Prompt

The instruction given to an AI model. Prompts are editable objects you can version and reuse across experiments and playgrounds. Prompts guide

Prompt engineering

The practice of designing, optimizing, and refining prompts to improve AI model outputs and performance.

Regression testing

Evaluations that ensure new model or prompt configurations maintain or improve upon previous performance benchmarks.

Remote eval

An evaluation that is executed on external or third-party systems or services, allowing you to evaluate tasks in environments outside Braintrust. Remote evals guide

Saved parameters

Centrally-managed configuration for evaluations. Create versioned parameter definitions in Braintrust and load them across evaluations for reusability, version control, and environment management. Parameters guide

Scorer

The component responsible for judging the quality of AI outputs. Scorers may be:

Rule-based code
LLM-based prompts as judges
Human reviewers

Scorers guide

Setting

An organization-level preference or control, including user management, billing, and global integrations. Organizations reference

Span

A single segment within a trace, representing one operation (for example, a model call or tool execution) with its timing and metadata. Traces guide

SQL

SQL queries for filtering and analyzing eval results, logs, and metrics. Braintrust also supports BTQL, an alternative pipe-delimited syntax. SQL reference

Structured output

A defined format (for example, JSON or XML) that models must follow, enabling consistent parsing and scoring of responses. Structured output guide

Task

A single unit of work, typically composed of an input, output, expected result, and evaluation. Tasks often appear within dataset or eval detail screens.

Thread view

A visualization mode for traces that displays the interaction as a conversation thread, showing messages, tool calls, and scores in chronological order. Thread view is particularly useful for debugging LLM conversations and multi-turn interactions. Traces guide

Timeline view

A visualization mode for traces that displays spans as horizontal bars, where bar width represents the duration of each operation. Timeline view is useful for identifying performance bottlenecks and understanding execution flow. Traces guide

Topic

A cluster of similar facet summaries, generated by the Topics pipeline to label recurring patterns across your logs. Each trace is classified with the closest topic. Examples: “Refund requests,” “API errors.” How Topics work

Trace

An individual recorded session detailing each step of an interaction: model calls, tool invocations, and intermediate outputs. Traces aid debugging and root-cause analysis. Traces guide

User feedback

End-user inputs and ratings collected from production that inform model performance tracking and future evals. User feedback guide

Workflow

A type of task that can be used in playgrounds. Consists of a chained sequence of prompts that automate complex workflows, where one LLM call’s output feeds into the next. Workflows guide

​Alert

​Automation

​Benchmark

​Brainstore

​Classification

​Configuration

​Dataset

​Evaluation / Eval

​Experiment

​Facet

​Human review

​Inline parameters

​Log

​Loop agent

​Metric

​Model

​Organization

​OTEL

​Playground

​Preprocessor

​Project

​Prompt

​Prompt engineering

​Regression testing

​Remote eval

​Saved parameters

​Scorer

​Setting

​Span

​SQL

​Structured output

​Task

​Thread view

​Timeline view

​Topic

​Trace

​User feedback

​Workflow