API reference - Braintrust

This page covers the key APIs in the Braintrust TypeScript SDK. For setup, see the Quickstart. For the complete reference, see the full TypeScript SDK reference.

Tracing

Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation: call initLogger() once at startup and run your app with the Braintrust import or build hook, and supported AI clients are traced with no per-call changes (see Install and instrument). Or, to learn how to instrument a specific provider client by hand, see Wrappers. The APIs below configure tracing and let you trace your own application code.

`initLogger()`

Creates the logger that sends your traces to Braintrust, your starting point for tracing. Call it once on startup: by default it becomes the current logger that traced(), startSpan(), and auto-instrumentation log through, and auto-instrumentation requires it.

import { initLogger } from "braintrust";

const logger = initLogger({
  projectName: "Support bot",
});

logger.log({
  input: { question: "How do I reset my password?" },
  output: { answer: "Use the account recovery flow." },
  metadata: { route: "/support" },
});

await logger.flush();

Returns: Logger Options (all optional):

projectName (string): project name for logs. If omitted, logs go to the global project.
projectId (string): project ID. Takes precedence over projectName.
apiKey (string): API key. Defaults to BRAINTRUST_API_KEY.
appUrl (string): Braintrust app URL. Defaults to https://www.braintrust.dev.
orgName (string): organization name, useful when credentials can access multiple orgs.
asyncFlush (boolean): defaults to true. Set false to make logger calls like logger.log() awaitable.
setCurrent (boolean): defaults to true. Controls whether currentLogger() returns this logger and what logger auto-instrumentation logs through.
forceLogin (boolean): log in again even if the SDK already has credentials.
fetch (typeof fetch): custom fetch implementation for SDK requests. Takes a function that follows the same API as native fetch.
debugLogLevel ("error" | "warn" | "info" | "debug" | false): enables SDK troubleshooting output.

Useful logger methods:

log(event) → string: logs a span and returns its row ID. Returns Promise<string> when the logger uses asyncFlush: false.
traced(callback, args?) → the callback’s result: creates a root span under the logger and ends it when the callback finishes. Spans started within the callback are nested under the traced span.
startSpan(args?) → Span: starts a root span that you end manually.
logFeedback(event) → void: adds feedback scores, expected values, tags, or comments to an existing row.
updateSpan(event) → void: updates an already-written span by ID. Flush the original span before updating it.
export() → Promise<string>: a serialized parent string for distributed tracing.
flush() → Promise<void>: sends pending rows to Braintrust, resolving when they’re all flushed.

`currentLogger()`

The logger most recently installed by initLogger({ setCurrent: true }), or undefined if no current logger exists.

import { currentLogger } from "braintrust";

currentLogger()?.log({ input: "ping", output: "pong" });

Returns: Logger | undefined

`traced()`

Traces a piece of your own code. Pass a callback and it runs inside a span: the span is current for the duration of the callback, so nested spans and traced LLM calls attach to it, errors thrown inside are logged to the span, and the span ends automatically when the callback returns.

import { traced } from "braintrust";

const answer = await traced(
  async (span) => {
    span.log({ input: { question: "What changed?" } });
    const output = await answerQuestion("What changed?");
    span.log({ output });
    return output;
  },
  {
    name: "answerQuestion",
    type: "function",
  },
);

Returns: the callback’s result Parameters:

callback ((span) => R, required): the function to run inside the span. Its return value is returned.
args (optional): span options.
- name (string): span name shown in Braintrust.
- type (SpanType): span type, such as function, task, tool, llm, score, or eval.
- spanAttributes (Record<PropertyKey, unknown>): extra span attributes.
- startTime (number): start time as a Unix timestamp in seconds.
- parent (string): exported parent span string.
- spanId (string): explicit span ID.
- setCurrent (boolean): defaults to true. Controls whether currentSpan() returns this span inside the callback.

`startSpan()`

Creates a span you manage by hand: you start it, log to it, and call end() yourself when the work finishes. Reach for it when the work you’re tracing doesn’t fit inside a single callback, such as when a span starts and ends in different places or you’re integrating at a lower level. Unlike traced(), the span is not made current, so other spans and traced LLM calls won’t automatically nest under it.

import { startSpan } from "braintrust";

const span = startSpan({ name: "retrieve", type: "function" });

try {
  span.log({ input: { query: "password reset" } });
  const documents = await retrieveDocuments("password reset");
  span.log({ output: documents });
} finally {
  span.end();
}

Returns: Span

`currentSpan()`

Gives you the span you’re currently inside, so you can add data to it without holding a direct reference, for example from a helper function called within a traced() callback. If no span is active, returns a no-op span: an object with the same interface that silently discards anything you log, so calls like currentSpan().log(...) are always safe.

import { currentSpan } from "braintrust";

currentSpan().log({
  metadata: { cacheHit: true },
});

Returns: Span

`withCurrent()`

Makes an existing span the current span for the duration of a callback. A span from startSpan() isn’t current on its own, so spans and traced LLM calls created afterward won’t nest under it. Wrap that work in withCurrent(span, callback) and it nests under span, with currentSpan() returning it inside the callback.

import { startSpan, withCurrent, currentSpan } from "braintrust";

const span = startSpan({ name: "manual parent" });

await withCurrent(span, async () => {
  currentSpan().log({ metadata: { inside: true } });
});

span.end();

Returns: the callback’s result Parameters:

span (Span, required): the span to make current inside the callback.
callback ((span) => R, required): the work to run with span current. Its return value is returned.

`withParent()`

Attaches the spans created inside its callback to a parent that lives somewhere else, such as another process, service, or job. Get the parent’s location by calling export() on a span or logger, pass that string to withParent(), and the work inside nests under the original trace. This is how you trace across process boundaries (distributed tracing).

import { initLogger, startSpan, traced, withParent } from "braintrust";

initLogger({ projectName: "Support bot" });

// parent is a string, meaning it is serializable and can be transferred between processes or services.
const parent = await startSpan({ name: "foo" }).export();

await withParent(parent, async () => {
  await traced(async (span) => {
    span.log({ input: "child work" });
  });
});

Returns: the callback’s result Parameters:

parent (string, required): an exported parent string, from a span’s or logger’s export().
callback (() => R, required): the work to run under that parent. Its return value is returned.

`logError()`

Logs an error and stack trace to a span’s error field.

import { currentSpan, logError } from "braintrust";

try {
  await runStep();
} catch (error) {
  logError(currentSpan(), error);
  throw error;
}

Returns: void Parameters:

span (Span, required): the span to attach the error to.
error (unknown, required): the caught error, whose message and stack are logged.

Errors thrown inside traced() are logged automatically, so use logError() for errors you catch yourself.

`permalink()`

Turns an exported span string into a shareable Braintrust UI URL, so you can link straight to a specific trace from your own logs, dashboards, or alerts. Pass the string returned by a span’s export().

import { traced, permalink } from "braintrust";

const slug = await traced((span) => span.export(), { name: "request" });
const url = await permalink(slug);

Returns: Promise<string> Links can be generated before flushing, but they become viewable only after the span and its root have been flushed and ingested. If you have a span object, span.permalink() is usually simpler.

`flush()`

Sends any buffered log rows to Braintrust and returns a promise that resolves once they’re all sent. Call it before a script or short-lived process exits so pending events aren’t lost.

import { flush } from "braintrust";

await flush();

Returns: Promise<void>

`setMaskingFunction()`

Installs a global masking function that runs over your logged data before it leaves your process, so you can redact sensitive values like PII or secrets before they ever reach Braintrust. Set it to null to remove a previously configured masking function.

import { setMaskingFunction } from "braintrust";

setMaskingFunction((value) => {
  // `value` is a logged field's full value, which may be a string, array, or
  // object. Walk it and return a masked copy with any sensitive data redacted.
  return value;
});

Returns: void Masking is applied at flush time for fields such as input, output, expected, metadata, context, scores, and metrics.

Evaluations

An evaluation runs your task over a set of cases, scores each output, and logs the results to an experiment, which is how you measure quality and catch regressions as you change prompts or models. Eval() is the main entry point. The other APIs here summarize and report results.

`Eval()`

Runs an evaluation from your data, a task, and scorers or classifiers: it runs the task over every case, scores the outputs, logs each row to an experiment, and returns a result with a summary you can compare across runs.

import { Eval } from "braintrust";

await Eval("My Project", {
  data: [
    {
      input: "How do I reset my password?",
      expected: "Use the password reset flow.",
    },
  ],
  task: async (input) => {
    return await answerQuestion(input);
  },
  scores: [
    ({ output, expected }) => ({
      name: "exact_match",
      score: output === expected ? 1 : 0,
    }),
  ],
});

Returns: Promise<EvalResultWithSummary> Eval(project, evaluator, options?) takes three arguments.

project

The project name or ID (a string) to log the experiment to.

evaluator

Defines the evaluation: the data to run against, the task to test, and how to score the results. Its core fields:

data (EvalData, required): the evaluation cases. Provide an array of cases, a function or promise that returns an array, an AsyncIterable/AsyncGenerator, or BaseExperiment(...). Each case has:
- input (required): the value passed to the task.
- expected: the expected output for this case, if any.
- metadata / tags: per-case metadata and tags.
- trialCount: how many times to run the task for this case.
task ((input, hooks?) => output, required): the function under test. It receives two arguments:
- input: the current case’s input value.
- hooks (optional): a context object the framework passes in for the current evaluation row. Use its fields to read parameters, log to the task’s span, or adjust the row:
  - span: the task’s span. Log extra data or child spans to it.
  - parameters: the validated runtime parameters.
  - metadata / tags: mutate to change the current row’s metadata or tags.
  - expected: the expected output for the current case, if it provided one.
  - trialIndex: zero-based trial index, useful when trialCount is greater than 1.
  - reportProgress: reports task progress for long-running evals.
```
await Eval("Support bot", {
  data: [{ input: "ping", expected: "pong" }],
  task: async (input, hooks) => {
    hooks.span.log({ metadata: { cacheHit: false } });
    return await answerQuestion(input);
  },
  scores: [({ output, expected }) => (output === expected ? 1 : 0)],
});
```

scores (EvalScorer[]): scorer functions, required unless classifiers is set. Each receives the case fields plus output and an optional trace, and returns a number, a Score, an array of scores, or null to skip the row. A bare number is named after the function (or scorer_<index>); return a Score for a custom name or metadata.

type EvalScorerArgs = EvalCase & {
  output: unknown;
  trace?: Trace;
};

type Score = {
  name: string;
  score: number | null;
  metadata?: Record<string, unknown>;
};

type EvalScorer = (
  args: EvalScorerArgs,
) => number | Score | Score[] | null | Promise<number | Score | Score[] | null>;

classifiers (EvalClassifier[]): like scorers, but return a Classification and write to the classifications field instead of scores (required unless scores is set). id is the machine-readable outcome, label is the display label (defaults to id), and name is the key (defaults to the function name, or classifier_<index>).

type Classification = {
  name: string;
  id: string;
  label?: string;
  metadata?: Record<string, unknown>;
};

type EvalClassifier = (
  args: EvalScorerArgs,
) =>
  | Classification
  | Classification[]
  | null
  | Promise<Classification | Classification[] | null>;

parameters (EvalParameters | loadParameters()): named, typed settings that make the eval configurable, so you can re-run it with different values (a temperature, a model, a prompt) without editing code. Declare them with defaults here, and the task reads the chosen values from hooks.parameters. Pass runtime values in the options argument (validated against your schema), or load a saved set with loadParameters(), whose defaults merge with any runtime overrides. Supports Zod values, model parameters (exposed as strings), and prompt parameters (exposed as Prompt objects).

import { Eval } from "braintrust";
import { z } from "zod";

await Eval(
  "Support bot",
  {
    data: [{ input: "ping", expected: "pong" }],
    parameters: {
      temperature: z.number().min(0).max(2).default(0),
      model: { type: "model", default: "gpt-5-mini" },
      prompt: {
        type: "prompt",
        default: {
          model: "gpt-5-mini",
          messages: [{ role: "user", content: "{{input}}" }],
        },
      },
    },
    task: async (input, hooks) => {
      return runTask(input, hooks.parameters);
    },
    scores: [({ output, expected }) => (output === expected ? 1 : 0)],
  },
  {
    parameters: {
      temperature: 0.2,
    },
  },
);

The remaining fields are optional:

experimentName (string): experiment name.
trialCount (number): number of task runs per input.
maxConcurrency (number): maximum concurrent task/scorer work.
timeout (number): evaluation timeout in milliseconds.
metadata (Record<string, unknown>): experiment metadata.
tags (string[]): experiment tags.
baseExperimentName (string): base experiment name for comparison.
baseExperimentId (string): base experiment ID, taking precedence over baseExperimentName.
summarizeScores (boolean): defaults to true.
flushBeforeScoring (boolean): flushes task spans before scorers run.

options

An optional object of run settings:

noSendLogs (boolean): run locally and build a local summary without sending logs.
onStart ((summary) => void): called when the experiment starts.
stream ((event) => void): receives progress events.
parent (string): exported parent span for nested or distributed eval logging.
parameters (Record<string, unknown>): runtime parameter values.
returnResults (boolean): defaults to true. Set false for large evals to keep only aggregate summary data.
enableCache (boolean): defaults to true. Controls the local span cache used by scorers.

`buildLocalSummary()`

Builds an ExperimentSummary from local eval results. The SDK uses this when an eval runs without a remote experiment, such as Eval(..., { noSendLogs: true }).

import { buildLocalSummary } from "braintrust";

const summary = buildLocalSummary(evaluatorDef, results);

Returns: ExperimentSummary Parameters:

evaluatorDef (EvaluatorDef, required): your Eval() evaluator object plus the SDK-populated projectName and evalName fields.
```
type EvaluatorDef = Evaluator & {
  projectName: string;
  evalName: string;
};
```
results (EvalResult[], required): the row-level eval results to summarize.
precomputedScores (ScoreAccumulator, optional): score totals and counts to use instead of deriving them from results. Pass it when you ran with returnResults: false (so per-row results aren’t available) or you’ve already accumulated scores yourself. It maps each score’s name to a running total and count:
```
type ScoreAccumulator = Record<
  string,
  {
    total: number;
    count: number;
  }
>;
```

Computes average scores from the results or the precomputed accumulator.

`reportFailures()`

Prints failing eval rows to the console in the same format the default reporter uses. Use it when you run evals with a custom reporter but still want readable failure output.

import { reportFailures } from "braintrust";

reportFailures(evaluatorDef, failingResults, {
  verbose: true,
  jsonl: false,
});

Returns: void Parameters:

evaluatorDef (EvaluatorDef, required): evaluator metadata used in error output.
failingResults (EvalResult[], required): results whose error field is set.
options ({ verbose: boolean; jsonl: boolean }, required): verbose prints full errors. jsonl emits a JSON line containing evaluator name and error strings.

`BaseExperiment()`

Uses a previous experiment as evaluation data. The previous experiment’s output becomes the current eval’s expected value. This is useful if you want to test, for example, a new prompt for regressions.

import { BaseExperiment, Eval } from "braintrust";

await Eval("My Project", {
  data: BaseExperiment({ name: "retrieval-baseline" }),
  task: async (input) => await answerQuestion(input),
  scores: [qualityScorer],
});

Returns: BaseExperiment If name is provided, the SDK uses that experiment both as the comparison base for the new experiment and as the read-only dataset for eval rows. If name is omitted, the SDK derives the base experiment from git metadata:

During experiment registration, the SDK sends ancestor_commits when no explicit baseExperimentId or baseExperimentName is set.
ancestor_commits contains up to 1,000 commit hashes from the current branch, starting at the merge base with the default remote base branch and ending at HEAD. For a clean working tree, the merge base is computed from HEAD^; for a dirty working tree, it is computed from HEAD.
Braintrust chooses the newest experiment in the same project on the nearest commit in that ancestor list that you can read.
The chosen experiment ID is stored as the new experiment’s base experiment and then opened as dataset data for BaseExperiment().

The SDK chooses the default remote base branch from the local main, master, or develop branch when exactly one exists, otherwise from the remote HEAD branch, and falls back to main if the remote HEAD cannot be read. If no experiment matches the ancestor commits, later base lookup can use a project-configured baseline experiment or the nearest earlier experiment in the same project by creation time. If no base experiment can be found, BaseExperiment() fails.

Experiments

An experiment is a single evaluation run logged to a project. Use these APIs when you want to create an experiment and log rows yourself, instead of letting Eval() manage one for you.

`initExperiment()`

Initializes an experiment in a project. Use it when you want to log experiment rows manually.

import { initExperiment } from "braintrust";

const experiment = initExperiment({
  project: "Support bot",
  experiment: "retrieval-baseline",
  description: "Manual retrieval evaluation",
});

experiment.log({
  input: "How do I reset my password?",
  output: "Use the account recovery flow.",
  expected: "Follow the password reset instructions.",
  scores: { correctness: 0.8 },
});

const summary = await experiment.summarize();
await experiment.flush();

Returns: Experiment Options:

project (string): project name. Required unless projectId is set.
projectId (string): project ID. Takes precedence over project.
experiment (string): experiment name. Generated automatically if omitted.
description (string): experiment description.
dataset (string | Dataset): dataset to associate with the experiment.
metadata (Record<string, unknown>): experiment metadata.
tags (string[]): experiment tags.
baseExperiment (string): base experiment name for comparison.
baseExperimentId (string): base experiment ID. Takes precedence over baseExperiment.
update (boolean): continue logging to an existing experiment with the same name.
open (boolean): open an existing experiment in read-only mode.
isPublic (boolean): whether links are public. Defaults to private.
setCurrent (boolean): defaults to true. Controls whether global log() uses this experiment.

Datasets

A dataset is a versioned collection of cases you manage in Braintrust and reuse across experiments and evals. Use initDataset() to create a dataset or open an existing one.

`initDataset()`

Creates or opens a dataset in a project.

import { initDataset } from "braintrust";

const dataset = initDataset({
  project: "Support bot",
  dataset: "Password reset questions",
});

dataset.insert({
  input: "How do I reset my password?",
  expected: "Use the password reset flow.",
  metadata: { topic: "account" },
});

await dataset.flush();

Returns: Dataset Options:

project (string): project name. Required unless projectId is set.
projectId (string): project ID. Takes precedence over project.
dataset (string): dataset name. Generated automatically if omitted.
description (string): dataset description.
metadata (Record<string, unknown>): dataset metadata.
version (string): when fetching or iterating rows, read the dataset at this transaction ID. Takes precedence over snapshotName and environment.
snapshotName (string): when fetching or iterating rows, read the dataset version captured by this named snapshot. Takes precedence over environment.
environment (string): when fetching or iterating rows, read the dataset version assigned to this environment slug.
useOutput (boolean): deprecated legacy mode that maps expected to output when fetching rows.

version, snapshotName, and environment select the committed dataset version used for reads. Calls such as insert() and flush() still write new rows to the dataset.

Prompts and functions

In Braintrust, functions are units of logic you define and version in the UI, then load or invoke from your code. A prompt is a function whose job is to call a model with a templated set of messages. Other functions include scorers, tools, and code you deploy. Load and render a saved prompt with loadPrompt(), or call a function directly with invoke().

`loadPrompt()`

Loads a saved prompt from Braintrust. Use the returned Prompt object’s build() method to render model parameters, messages, tools, or completion text with runtime variables.

import { loadPrompt } from "braintrust";

const prompt = await loadPrompt({
  projectName: "Support bot",
  slug: "answer-question",
  environment: "production",
  defaults: {
    model: "gpt-5-mini",
  },
});

const built = prompt.build({
  question: "How do I reset my password?",
  tone: "concise",
}, {
  strict: true,
});

Returns: Promise<Prompt> Options:

projectName (string): project containing the prompt. Required unless projectId or id is set.
projectId (string): project ID. Takes precedence over projectName.
slug (string): prompt slug. Required unless id is set.
id (string): prompt ID. Takes precedence over project and slug.
version (string): prompt version. Takes precedence over environment.
environment (string): environment assignment such as production or staging.
defaults (Record<string, unknown>): default model, parameter, message, or completion fields used by build(). Saved prompt fields override these defaults.
noTrace (boolean): if true, built prompt metadata is not attached to traces.

If neither version nor environment is set, the SDK loads the latest version and can fall back to the local prompt cache. build(args, options?) renders the prompt. Parameters:

args (object, required): the template variables. Object keys become variables, and the whole value is available as input. Rendered with Mustache by default, with non-string values JSON-stringified.
options (optional):
- flavor ("chat" | "completion"): defaults to "chat". Use "completion" for completion prompts.
- messages (Message[]): additional chat messages to append. If the prompt already has a system message, extra system messages are omitted.
- strict (boolean): validates that referenced template variables are present.
- templateFormat ("mustache" | "nunjucks" | "none"): overrides the prompt’s template format. Nunjucks requires @braintrust/template-nunjucks. none leaves template strings unrendered.

Use buildWithAttachments() instead of build() when prompt variables contain Braintrust attachment references that need to be resolved before rendering.

`getPromptVersions()`

Lists the version IDs for a saved prompt, one for each change recorded when the prompt was created or updated. Use it to find an earlier version to load or pin.

import { getPromptVersions, loadPrompt } from "braintrust";

const versions = await getPromptVersions(projectId, promptId);
const prompt = await loadPrompt({
  id: promptId,
  version: versions[0],
});

Returns: a promise of the prompt’s version IDs Parameters:

projectId (string, required): the project’s ID.
promptId (string, required): the prompt’s ID.

The returned strings can be passed to loadPrompt({ version }) to load a specific historical prompt version. getPromptVersions() logs in internally with the global SDK state. If BRAINTRUST_API_KEY is set, or if a previous SDK call already logged in, you can call it directly. Call login() first when you need to provide apiKey, orgName, appUrl, or a custom fetch, because getPromptVersions() does not accept those options itself.

`loadParameters()`

Loads a set of parameters you’ve saved in Braintrust, so an eval or application can run with configuration managed in the UI instead of hardcoded values. The example below passes them to Eval().

import { Eval, loadParameters } from "braintrust";

const parameters = await loadParameters({
  projectName: "Support bot",
  slug: "retrieval-config",
  environment: "production",
});

await Eval("Support bot", {
  data: [{ input: "ping", expected: "pong" }],
  parameters,
  task: async (input, hooks) => {
    return await answerQuestion(input, hooks.parameters);
  },
  scores: [({ output, expected }) => (output === expected ? 1 : 0)],
});

Returns: a promise of the resolved parameters Use projectName with slug, or projectId with slug, or id. version takes precedence over environment.

`invoke()`

Invokes a Braintrust function, prompt, scorer, or tool.

import { invoke } from "braintrust";
import { z } from "zod";

const result = await invoke({
  projectName: "Support bot",
  slug: "answer-question",
  input: { question: "How do I reset my password?" },
  schema: z.object({ answer: z.string() }),
});

Returns: the function’s output (JSON), or a BraintrustStream when stream: true Identification options. Provide one of these: a function_id, a projectName or projectId plus a slug, or a globalFunction.

function_id (string): function ID.
projectName (string): project containing the function.
projectId (string): ID of the project containing the function. Can be used instead of projectName.
slug (string): function slug.
globalFunction (string): global function name. See Global Functions.
functionType (FunctionType): global function type. Defaults to scorer for global functions.
version (string): function version.

Execution options:

input (Input, required): logged as the span input.
messages (Message[]): additional OpenAI-style messages for LLM functions.
metadata (Record<string, unknown>): logged as span metadata and available to the function.
tags (string[]): logged as span tags.
parent (Exportable | string): parent span, logger, experiment, or exported parent string.
stream (boolean): returns a BraintrustStream when true.
mode (StreamingMode): function streaming mode.
strict (boolean): throws when prompt variable names do not match input keys.
schema (z.ZodSchema): validates non-streaming output and returns the typed value.

`initFunction()`

Creates a callable wrapper around a Braintrust function for use as an eval task or scorer.

import { Eval, initFunction } from "braintrust";

const answerQuestion = initFunction({
  projectName: "Support bot",
  slug: "answer-question",
});

await Eval("Support bot", {
  data: [{ input: { question: "ping" }, expected: "pong" }],
  task: answerQuestion,
  scores: [qualityScorer],
});

Returns: a function (input) => Promise<output> Options:

projectName (string, required): project containing the function.
slug (string, required): function slug.
version (string): function version.
state (BraintrustState): SDK state override.

Attachments

Attachments let you log files or large payloads without storing the full bytes inline in the span. This is useful to circumvent the 6 MB request limit to Braintrust. When you trace AI calls, Braintrust automatically converts base64 attachments in provider messages into uploaded attachments, so you rarely need the APIs below for instrumented calls. Reach for them when you’re attaching binary content to a span yourself.

`Attachment`

Uploads local or in-memory file data and replaces it with an AttachmentReference during logging.

import { Attachment, initLogger } from "braintrust";

const logger = initLogger({ projectName: "Support bot" });

logger.log({
  input: {
    document: new Attachment({
      data: "./fixtures/account-reset.pdf",
      filename: "account-reset.pdf",
      contentType: "application/pdf",
    }),
  },
});

Constructor fields are data (string, Blob, or ArrayBuffer), filename, contentType, and optional state.

`ExternalAttachment`

References a file that already exists in an external object store.

import { ExternalAttachment, initLogger } from "braintrust";

const logger = initLogger({ projectName: "Support bot" });

logger.log({
  input: {
    document: new ExternalAttachment({
      url: "s3://support-docs/account-reset.pdf",
      filename: "account-reset.pdf",
      contentType: "application/pdf",
    }),
  },
});

ExternalAttachment.upload() is a no-op because the file already exists externally.

`JSONAttachment`

Serializes JSON and uploads it as an application/json attachment. Use it for large JSON objects that should be viewable but do not need to be indexed for search.

import { JSONAttachment, initLogger } from "braintrust";

const logger = initLogger({ projectName: "Support bot" });

logger.log({
  input: {
    transcript: new JSONAttachment(conversation, {
      filename: "conversation.json",
      pretty: true,
    }),
  },
});

Options are filename, pretty, and optional state.

`ReadonlyAttachment`

Reads an already-uploaded attachment from a dataset, experiment, or raw AttachmentReference.

import { ReadonlyAttachment, type AttachmentReference } from "braintrust";

async function readAttachment(reference: AttachmentReference) {
  const attachment = new ReadonlyAttachment(reference);
  const blob = await attachment.data();
  const status = await attachment.status();

  return { blob, status };
}

Useful methods:

data() → Promise<Blob>: downloads the attachment as a Blob.
asBase64Url() → Promise<string>: returns a data:<content-type>;base64,... URL for prompts.
metadata() → Promise<AttachmentMetadata>: fetches download URL and upload status.
status() → Promise<AttachmentStatus>: fetches current upload status.

Wrappers

Each wrapper returns the same client or module you pass in, with its methods traced, so the return type always matches the argument type. If Braintrust logging is not configured, wrapper calls are effectively no-ops. Initialize a logger before wrapped calls:

import { initLogger } from "braintrust";

initLogger({ projectName: "Support bot" });

OpenAI

import OpenAI from "openai";
import { wrapOpenAI } from "braintrust";

const client = wrapOpenAI(new OpenAI());

await client.responses.create({
  model: "gpt-5-mini",
  input: "Answer in one sentence.",
});

wrapOpenAI(openai): wraps OpenAI SDK clients and supports current v4, v5, and v6 style clients.
wrapOpenAIv4(openai): lower-level OpenAI wrapper used by wrapOpenAI(). Use wrapOpenAI() unless you know you need this compatibility entry point.

Traced OpenAI surfaces include chat completions, beta chat helpers, responses, embeddings, and moderations where present on the client.

Anthropic

import Anthropic from "@anthropic-ai/sdk";
import { wrapAnthropic } from "braintrust";

const anthropic = wrapAnthropic(new Anthropic());

wrapAnthropic() traces messages.create, beta messages.create, and beta tool runner calls when available.

Vercel AI SDK

import * as ai from "ai";
import { openai } from "@ai-sdk/openai";
import { wrapAISDK } from "braintrust";

const { generateText, streamText, Agent } = wrapAISDK(ai);

await generateText({
  model: openai("gpt-5-mini"),
  prompt: "Summarize the ticket.",
});

wrapAISDK(ai, options?): wraps AI SDK namespace functions such as generateText, streamText, generateObject, streamObject, embed, embedMany, rerank, and agent constructors.
wrapAgentClass(AgentClass, options?): wraps an AI SDK agent class directly.

wrapAISDK() accepts denyOutputPaths?: string[] to omit selected output paths from logged output.

Agent SDKs

wrapClaudeAgentSDK(sdk): wraps the Claude Agent SDK module, including query, tool, and createSdkMcpServer integration points.
wrapOpenRouterAgent(agent): wraps @openrouter/agent clients.
wrapGoogleADK(adkModule): wraps Google ADK modules, including runner, agent, and tool execution methods.

Model provider SDKs

wrapGoogleGenAI(): wraps the entire @google/genai package export (similar to the AI SDK wrapper). Traces models.generateContent, models.generateContentStream, and models.embedContent.
wrapOpenRouter(): pass an OpenRouter client. Traces chat, responses, embeddings, rerank, and callModel() where available.
wrapMistral(): pass a Mistral client. Traces chat, FIM (fill-in-the-middle), agents, and embeddings.
wrapCohere(): pass a Cohere client. Traces chat, chat streaming, embed, and rerank.
wrapGroq(): pass a Groq client. Traces chat completions and embeddings.
wrapHuggingFace(): pass a Hugging Face Inference module or client. Traces chat completion, text generation, streaming variants, and feature extraction.

Testing

Run Braintrust evaluations inside your test runner, so eval cases live alongside your unit tests and your suite fails when scores regress.

`wrapVitest()`

Wraps Vitest test APIs so tests can create Braintrust experiment rows with inputs, expected values, scorer results, and pass/fail status.

import * as vitest from "vitest";
import { wrapVitest } from "braintrust";

const bt = wrapVitest(vitest, {
  projectName: "Support bot",
});

bt.describe("retrieval", () => {
  bt.test(
    "answers password reset",
    {
      input: "How do I reset my password?",
      expected: "Use the password reset flow.",
      scorers: [qualityScorer],
    },
    async ({ input }) => {
      const output = await answerQuestion(input);
      bt.logOutputs(output);
      bt.expect(output).toBeTruthy();
      return output;
    },
  );
});

Returns: the wrapped Vitest APIs (test, it, expect, describe, lifecycle hooks, and Braintrust helpers) wrapVitest() returns wrapped test, it, expect, describe, lifecycle hooks, logOutputs, logFeedback, getCurrentSpan, and flushExperiment().

`initNodeTestSuite()`

Creates a Braintrust-backed suite for Node’s built-in node:test runner.

import { after, describe, test } from "node:test";
import assert from "node:assert/strict";
import { initNodeTestSuite } from "braintrust";

describe("retrieval", () => {
  const suite = initNodeTestSuite({
    projectName: "Support bot",
    after,
  });

  test(
    "answers password reset",
    suite.eval(
      {
        input: "How do I reset my password?",
        expected: "Use the password reset flow.",
        scorers: [qualityScorer],
      },
      async ({ input }) => {
        const output = await answerQuestion(input);
        assert.ok(output);
        return output;
      },
    ),
  );
});

Returns: a suite with eval() and flush() The suite exposes eval(config, fn) and flush(). If you pass Node’s after hook, flush() is registered automatically.

Configuration

Most configuration is passed per call, such as apiKey and projectName on initLogger(). You can also authenticate once with login(), or set credentials and deployment options through environment variables. Authenticates the SDK to Braintrust and stores the credentials in its global state, where every later SDK call reads them. Most SDK APIs log in automatically when they first need credentials.

import { login, loadPrompt } from "braintrust";

await login({
  apiKey: "bt_org_...", // Replace with your API key.
  orgName: "example-org",
});

const prompt = await loadPrompt({
  projectName: "Support bot",
  slug: "answer-question",
});

Returns: Promise<BraintrustState> Options (all optional):

apiKey (string): API key. Defaults to BRAINTRUST_API_KEY.
appUrl (string): Braintrust app URL. Defaults to https://www.braintrust.dev.
orgName (string): organization name, useful when credentials can access multiple orgs.
fetch (typeof fetch): custom fetch implementation for SDK requests.
forceLogin (boolean): log in again even if the SDK already has credentials.
noExitFlush (boolean): disables the process exit handler that flushes pending writes.
onFlushError ((error) => void): handles errors from the background flusher.
disableSpanCache (boolean): disables the local span cache used by eval scorers.
debugLogLevel ("error" | "warn" | "info" | "debug" | false): enables SDK troubleshooting output.

In most cases you don’t need to call login() yourself. When BRAINTRUST_API_KEY is set, APIs like initLogger(), initExperiment(), initDataset(), loadPrompt(), and loadParameters() authenticate automatically using it. Call login() yourself when you want to authenticate with something other than that environment variable, such as a specific API key, organization, or custom fetch. Logging in stores those credentials in the SDK’s global state, so every later SDK call uses them, including APIs like getPromptVersions() that take no auth options of their own.

Environment variables

BRAINTRUST_API_KEY: API key used to authenticate. Same as the apiKey option.
BRAINTRUST_API_URL: Braintrust API URL. Set this for self-hosted or data plane deployments.
BRAINTRUST_APP_URL: Braintrust app URL, used for permalinks. Defaults to https://www.braintrust.dev. Same as the appUrl option.
BRAINTRUST_ORG_NAME: organization name, useful when credentials can access multiple orgs. Same as the orgName option.
BRAINTRUST_DISABLE_INSTRUMENTATION: comma-separated list of integrations to skip during auto-instrumentation, for example openai,anthropic.
BRAINTRUST_DEBUG_LOG_LEVEL: SDK troubleshooting output: error, warn, info, or debug. Unset stays silent. Same as the debugLogLevel option.

​Tracing

​initLogger()

​currentLogger()

​traced()

​startSpan()

​currentSpan()

​withCurrent()

​withParent()

​logError()

​permalink()

​flush()

​setMaskingFunction()

​Evaluations

​Eval()

​buildLocalSummary()

​reportFailures()

​BaseExperiment()

​Experiments

​initExperiment()

​Datasets

​initDataset()

​Prompts and functions

​loadPrompt()

​getPromptVersions()

​loadParameters()

​invoke()

​initFunction()

​Attachments

​Attachment

​ExternalAttachment

​JSONAttachment

​ReadonlyAttachment

​Wrappers

​OpenAI

​Anthropic

​Vercel AI SDK

​Agent SDKs

​Model provider SDKs

​Testing

​wrapVitest()

​initNodeTestSuite()

​Configuration

​login()

​Environment variables

Tracing

`initLogger()`

`currentLogger()`

`traced()`

`startSpan()`

`currentSpan()`

`withCurrent()`

`withParent()`

`logError()`

`permalink()`

`flush()`

`setMaskingFunction()`

Evaluations

`Eval()`

`buildLocalSummary()`

`reportFailures()`

`BaseExperiment()`

Experiments

`initExperiment()`

Datasets

`initDataset()`

Prompts and functions

`loadPrompt()`

`getPromptVersions()`

`loadParameters()`

`invoke()`

`initFunction()`

Attachments

`Attachment`

`ExternalAttachment`

`JSONAttachment`

`ReadonlyAttachment`

Wrappers

OpenAI

Anthropic

Vercel AI SDK

Agent SDKs

Model provider SDKs

Testing

`wrapVitest()`

`initNodeTestSuite()`

Configuration

`login()`

Environment variables