This glossary defines key terms and concepts used in our product and documentation.Documentation Index
Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
Alert
An automation that notifies you when a specific condition occurs on your logs in Braintrust. Automations guideAutomation
A configured workflow that lets you trigger actions based on specific events in Braintrust. For example, sending an alert, or batch exporting data. Automations guideBenchmark
An evaluation designed to assess model performance across specific capabilities or against industry standards.Brainstore
The high-performance data engine backing logs, search, and tables. Brainstore blog postClassification
The topic label assigned to a trace by a facet, indicating which cluster the trace belongs to. Classifications appear as columns in the logs table and can be filtered or queried with SQL. How Topics workConfiguration
Project-level settings that define behavior for evals, experiments, and integrations. Project configuration guideDataset
A versioned collection of pairs of inputs and (optional) expected outputs. Datasets guideEvaluation / Eval
An eval consists of a task, dataset, and scorer(s). Evaluations can be:- Offline: run a task on a static dataset with scoring functions.
- Online: real-time scoring on production or test requests.
Experiment
An instance of an offline eval run. Scores a specific task run on a given dataset. Experiment guideFacet
A dimension of analysis applied to traces by the Topics pipeline (for example, Task, Sentiment, or Issues). Each facet uses a preprocessor and an LLM prompt to extract a short summary per trace. How Topics workHuman review
An option to route evaluations or tasks to human reviewers instead of, or in addition to, automated scorers. Human review guideInline parameters
Configurable options defined directly in yourEval() call using Zod, Pydantic, or plain dicts. Inline parameters become UI controls in the playground when running remote evals or sandboxes. To change the values, update them in your eval code.
Log
An instance of a live production or test interaction. Logs can include inputs, outputs, expected values, metadata, errors, scores, and tags. Scorers can also be applied to live logs to conduct online evaluations. Logs guideLoop agent
Braintrust’s AI agent that can help you with evaluation-related tasks, like optimizing prompts and generating dataset rows. Loop guideMetric
A quantitative measure of model performance (for example, accuracy, latency, or cost) tracked over time and across experiments.Model
An AI system (typically an LLM) that can be evaluated or monitored with Braintrust. Models can be first-party, third-party, or open-source.Organization
Your company or team “home” in Braintrust. It holds all your projects, members, and settings. Organizations referenceOTEL
OpenTelemetry: the instrumentation standard Braintrust uses to collect and export trace and span data from integrations. OpenTelemetry guidePlayground
An interactive space where you can prototype, iterate on, and compare multiple prompts and models against a dataset in real time. A playground can be saved as an experiment. Playgrounds guidePreprocessor
A function that formats a trace into readable text before a facet’s LLM extracts a summary. The default Thread preprocessor renders traces as conversation narratives. Custom preprocessors can adapt non-conversational trace shapes. How Topics workProject
A container for related experiments, datasets, and logs. Use projects to segment work by feature, environment (dev/prod), or team. Projects guidePrompt
The instruction given to an AI model. Prompts are editable objects you can version and reuse across experiments and playgrounds. Prompts guidePrompt engineering
The practice of designing, optimizing, and refining prompts to improve AI model outputs and performance.Regression testing
Evaluations that ensure new model or prompt configurations maintain or improve upon previous performance benchmarks.Remote eval
An evaluation that is executed on external or third-party systems or services, allowing you to evaluate tasks in environments outside Braintrust. Remote evals guideSaved parameters
Centrally-managed configuration for evaluations. Create versioned parameter definitions in Braintrust and load them across evaluations for reusability, version control, and environment management. Parameters guideScorer
The component responsible for judging the quality of AI outputs. Scorers may be:- Rule-based code
- LLM-based prompts as judges
- Human reviewers