> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# API reference

> Functions, classes, and configuration options found in the Braintrust TypeScript SDK.

This page covers the key APIs in the Braintrust TypeScript SDK. For setup, see the [Quickstart](/sdks/typescript/quickstart). For the complete reference, see the [full TypeScript SDK reference](/sdks/typescript/versions/latest).

## Tracing

Tracing records what your application does as spans you can inspect in Braintrust.

The recommended way to capture AI calls is auto-instrumentation: call `initLogger()` once at startup and run your app with the Braintrust import or build hook, and supported AI clients are traced with no per-call changes (see [Install and instrument](/sdks/typescript/install-and-instrument)). Or, to learn how to instrument a specific provider client by hand, see [Wrappers](#wrappers).

The APIs below configure tracing and let you trace your own application code.

### `initLogger()`

Creates the logger that sends your traces to Braintrust, your starting point for tracing. Call it once on startup: by default it becomes the current logger that `traced()`, `startSpan()`, and auto-instrumentation log through, and auto-instrumentation requires it.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { initLogger } from "braintrust";

const logger = initLogger({
  projectName: "Support bot",
});

logger.log({
  input: { question: "How do I reset my password?" },
  output: { answer: "Use the account recovery flow." },
  metadata: { route: "/support" },
});

await logger.flush();
```

Returns: `Logger`

Options (all optional):

* **`projectName`** (`string`): project name for logs. If omitted, logs go to the global project.
* **`projectId`** (`string`): project ID. Takes precedence over `projectName`.
* **`apiKey`** (`string`): API key. Defaults to `BRAINTRUST_API_KEY`.
* **`appUrl`** (`string`): Braintrust app URL. Defaults to `https://www.braintrust.dev`.
* **`orgName`** (`string`): organization name, useful when credentials can access multiple orgs.
* **`asyncFlush`** (`boolean`): defaults to `true`. Set `false` to make logger calls like `logger.log()` awaitable.
* **`setCurrent`** (`boolean`): defaults to `true`. Controls whether `currentLogger()` returns this logger and what logger auto-instrumentation logs through.
* **`forceLogin`** (`boolean`): log in again even if the SDK already has credentials.
* **`fetch`** (`typeof fetch`): custom fetch implementation for SDK requests. Takes a function that follows the same API as native `fetch`.
* **`debugLogLevel`** (`"error" | "warn" | "info" | "debug" | false`): enables SDK troubleshooting output.

Useful logger methods:

* **`log(event)`** → `string`: logs a span and returns its row ID. Returns `Promise<string>` when the logger uses `asyncFlush: false`.
* **`traced(callback, args?)`** → the callback's result: creates a root span under the logger and ends it when the callback finishes. Spans started within the callback are nested under the `traced` span.
* **`startSpan(args?)`** → `Span`: starts a root span that you end manually.
* **`logFeedback(event)`** → `void`: adds feedback scores, expected values, tags, or comments to an existing row.
* **`updateSpan(event)`** → `void`: updates an already-written span by ID. Flush the original span before updating it.
* **`export()`** → `Promise<string>`: a serialized parent string for distributed tracing.
* **`flush()`** → `Promise<void>`: sends pending rows to Braintrust, resolving when they're all flushed.

### `currentLogger()`

The logger most recently installed by `initLogger({ setCurrent: true })`, or `undefined` if no current logger exists.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { currentLogger } from "braintrust";

currentLogger()?.log({ input: "ping", output: "pong" });
```

Returns: `Logger | undefined`

### `traced()`

Traces a piece of your own code. Pass a callback and it runs inside a span: the span is current for the duration of the callback, so nested spans and traced LLM calls attach to it, errors thrown inside are logged to the span, and the span ends automatically when the callback returns.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { traced } from "braintrust";

const answer = await traced(
  async (span) => {
    span.log({ input: { question: "What changed?" } });
    const output = await answerQuestion("What changed?");
    span.log({ output });
    return output;
  },
  {
    name: "answerQuestion",
    type: "function",
  },
);
```

Returns: the callback's result

Parameters:

* **`callback`** (`(span) => R`, required): the function to run inside the span. Its return value is returned.
* **`args`** (optional): span options.
  * **`name`** (`string`): span name shown in Braintrust.
  * **`type`** (`SpanType`): span type, such as `function`, `task`, `tool`, `llm`, `score`, or `eval`.
  * **`spanAttributes`** (`Record<PropertyKey, unknown>`): extra span attributes.
  * **`startTime`** (`number`): start time as a Unix timestamp in seconds.
  * **`parent`** (`string`): exported parent span string.
  * **`spanId`** (`string`): explicit span ID.
  * **`setCurrent`** (`boolean`): defaults to `true`. Controls whether `currentSpan()` returns this span inside the callback.

### `startSpan()`

Creates a span you manage by hand: you start it, log to it, and call `end()` yourself when the work finishes. Reach for it when the work you're tracing doesn't fit inside a single callback, such as when a span starts and ends in different places or you're integrating at a lower level. Unlike `traced()`, the span is not made current, so other spans and traced LLM calls won't automatically nest under it.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { startSpan } from "braintrust";

const span = startSpan({ name: "retrieve", type: "function" });

try {
  span.log({ input: { query: "password reset" } });
  const documents = await retrieveDocuments("password reset");
  span.log({ output: documents });
} finally {
  span.end();
}
```

Returns: `Span`

### `currentSpan()`

Gives you the span you're currently inside, so you can add data to it without holding a direct reference, for example from a helper function called within a `traced()` callback. If no span is active, returns a *no-op span*: an object with the same interface that silently discards anything you log, so calls like `currentSpan().log(...)` are always safe.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { currentSpan } from "braintrust";

currentSpan().log({
  metadata: { cacheHit: true },
});
```

Returns: `Span`

### `withCurrent()`

Makes an existing span the current span for the duration of a callback. A span from `startSpan()` isn't current on its own, so spans and traced LLM calls created afterward won't nest under it. Wrap that work in `withCurrent(span, callback)` and it nests under `span`, with `currentSpan()` returning it inside the callback.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { startSpan, withCurrent, currentSpan } from "braintrust";

const span = startSpan({ name: "manual parent" });

await withCurrent(span, async () => {
  currentSpan().log({ metadata: { inside: true } });
});

span.end();
```

Returns: the callback's result

Parameters:

* **`span`** (`Span`, required): the span to make current inside the callback.
* **`callback`** (`(span) => R`, required): the work to run with `span` current. Its return value is returned.

### `withParent()`

Attaches the spans created inside its callback to a parent that lives somewhere else, such as another process, service, or job. Get the parent's location by calling `export()` on a span or logger, pass that string to `withParent()`, and the work inside nests under the original trace. This is how you trace across process boundaries (distributed tracing).

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { initLogger, startSpan, traced, withParent } from "braintrust";

initLogger({ projectName: "Support bot" });

// parent is a string, meaning it is serializable and can be transferred between processes or services.
const parent = await startSpan({ name: "foo" }).export();

await withParent(parent, async () => {
  await traced(async (span) => {
    span.log({ input: "child work" });
  });
});
```

Returns: the callback's result

Parameters:

* **`parent`** (`string`, required): an exported parent string, from a span's or logger's `export()`.
* **`callback`** (`() => R`, required): the work to run under that parent. Its return value is returned.

### `logError()`

Logs an error and stack trace to a span's `error` field.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { currentSpan, logError } from "braintrust";

try {
  await runStep();
} catch (error) {
  logError(currentSpan(), error);
  throw error;
}
```

Returns: `void`

Parameters:

* **`span`** (`Span`, required): the span to attach the error to.
* **`error`** (`unknown`, required): the caught error, whose message and stack are logged.

Errors thrown inside `traced()` are logged automatically, so use `logError()` for errors you catch yourself.

### `permalink()`

Turns an exported span string into a shareable Braintrust UI URL, so you can link straight to a specific trace from your own logs, dashboards, or alerts. Pass the string returned by a span's `export()`.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { traced, permalink } from "braintrust";

const slug = await traced((span) => span.export(), { name: "request" });
const url = await permalink(slug);
```

Returns: `Promise<string>`

Links can be generated before flushing, but they become viewable only after the span and its root have been flushed and ingested. If you have a span object, `span.permalink()` is usually simpler.

### `flush()`

Sends any buffered log rows to Braintrust and returns a promise that resolves once they're all sent. Call it before a script or short-lived process exits so pending events aren't lost.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { flush } from "braintrust";

await flush();
```

Returns: `Promise<void>`

### `setMaskingFunction()`

Installs a global masking function that runs over your logged data before it leaves your process, so you can redact sensitive values like PII or secrets before they ever reach Braintrust. Set it to `null` to remove a previously configured masking function.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { setMaskingFunction } from "braintrust";

setMaskingFunction((value) => {
  // `value` is a logged field's full value, which may be a string, array, or
  // object. Walk it and return a masked copy with any sensitive data redacted.
  return value;
});
```

Returns: `void`

Masking is applied at flush time for fields such as `input`, `output`, `expected`, `metadata`, `context`, `scores`, and `metrics`.

## Evaluations

An evaluation runs your task over a set of cases, scores each output, and logs the results to an experiment, which is how you measure quality and catch regressions as you change prompts or models. `Eval()` is the main entry point. The other APIs here summarize and report results.

### `Eval()`

Runs an evaluation from your data, a task, and scorers or classifiers: it runs the task over every case, scores the outputs, logs each row to an experiment, and returns a result with a summary you can compare across runs.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { Eval } from "braintrust";

await Eval("My Project", {
  data: [
    {
      input: "How do I reset my password?",
      expected: "Use the password reset flow.",
    },
  ],
  task: async (input) => {
    return await answerQuestion(input);
  },
  scores: [
    ({ output, expected }) => ({
      name: "exact_match",
      score: output === expected ? 1 : 0,
    }),
  ],
});
```

Returns: `Promise<EvalResultWithSummary>`

`Eval(project, evaluator, options?)` takes three arguments.

<AccordionGroup>
  <Accordion title="project">
    The project name or ID (a `string`) to log the experiment to.
  </Accordion>

  <Accordion title="evaluator">
    Defines the evaluation: the data to run against, the task to test, and how to score the results. Its core fields:

    * **`data`** (`EvalData`, required): the evaluation cases. Provide an array of cases, a function or promise that returns an array, an `AsyncIterable`/`AsyncGenerator`, or `BaseExperiment(...)`. Each case has:
      * `input` (required): the value passed to the task.
      * `expected`: the expected output for this case, if any.
      * `metadata` / `tags`: per-case metadata and tags.
      * `trialCount`: how many times to run the task for this case.

    * **`task`** (`(input, hooks?) => output`, required): the function under test. It receives two arguments:

      * `input`: the current case's input value.
      * `hooks` (optional): a context object the framework passes in for the current evaluation row. Use its fields to read parameters, log to the task's span, or adjust the row:
        * `span`: the task's span. Log extra data or child spans to it.
        * `parameters`: the validated runtime parameters.
        * `metadata` / `tags`: mutate to change the current row's metadata or tags.
        * `expected`: the expected output for the current case, if it provided one.
        * `trialIndex`: zero-based trial index, useful when `trialCount` is greater than 1.
        * `reportProgress`: reports task progress for long-running evals.

      ```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      await Eval("Support bot", {
        data: [{ input: "ping", expected: "pong" }],
        task: async (input, hooks) => {
          hooks.span.log({ metadata: { cacheHit: false } });
          return await answerQuestion(input);
        },
        scores: [({ output, expected }) => (output === expected ? 1 : 0)],
      });
      ```

    * **`scores`** (`EvalScorer[]`): scorer functions, required unless `classifiers` is set. Each receives the case fields plus `output` and an optional `trace`, and returns a number, a `Score`, an array of scores, or `null` to skip the row. A bare number is named after the function (or `scorer_<index>`); return a `Score` for a custom name or metadata.

      ```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      type EvalScorerArgs = EvalCase & {
        output: unknown;
        trace?: Trace;
      };

      type Score = {
        name: string;
        score: number | null;
        metadata?: Record<string, unknown>;
      };

      type EvalScorer = (
        args: EvalScorerArgs,
      ) => number | Score | Score[] | null | Promise<number | Score | Score[] | null>;
      ```

    * **`classifiers`** (`EvalClassifier[]`): like scorers, but return a `Classification` and write to the `classifications` field instead of `scores` (required unless `scores` is set). `id` is the machine-readable outcome, `label` is the display label (defaults to `id`), and `name` is the key (defaults to the function name, or `classifier_<index>`).

      ```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      type Classification = {
        name: string;
        id: string;
        label?: string;
        metadata?: Record<string, unknown>;
      };

      type EvalClassifier = (
        args: EvalScorerArgs,
      ) =>
        | Classification
        | Classification[]
        | null
        | Promise<Classification | Classification[] | null>;
      ```

    * **`parameters`** (`EvalParameters | loadParameters()`): named, typed settings that make the eval configurable, so you can re-run it with different values (a temperature, a model, a prompt) without editing code. Declare them with defaults here, and the task reads the chosen values from `hooks.parameters`. Pass runtime values in the `options` argument (validated against your schema), or load a saved set with `loadParameters()`, whose defaults merge with any runtime overrides. Supports Zod values, model parameters (exposed as strings), and prompt parameters (exposed as `Prompt` objects).

      ```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      import { Eval } from "braintrust";
      import { z } from "zod";

      await Eval(
        "Support bot",
        {
          data: [{ input: "ping", expected: "pong" }],
          parameters: {
            temperature: z.number().min(0).max(2).default(0),
            model: { type: "model", default: "gpt-5-mini" },
            prompt: {
              type: "prompt",
              default: {
                model: "gpt-5-mini",
                messages: [{ role: "user", content: "{{input}}" }],
              },
            },
          },
          task: async (input, hooks) => {
            return runTask(input, hooks.parameters);
          },
          scores: [({ output, expected }) => (output === expected ? 1 : 0)],
        },
        {
          parameters: {
            temperature: 0.2,
          },
        },
      );
      ```

    The remaining fields are optional:

    * `experimentName` (`string`): experiment name.
    * `trialCount` (`number`): number of task runs per input.
    * `maxConcurrency` (`number`): maximum concurrent task/scorer work.
    * `timeout` (`number`): evaluation timeout in milliseconds.
    * `metadata` (`Record<string, unknown>`): experiment metadata.
    * `tags` (`string[]`): experiment tags.
    * `baseExperimentName` (`string`): base experiment name for comparison.
    * `baseExperimentId` (`string`): base experiment ID, taking precedence over `baseExperimentName`.
    * `summarizeScores` (`boolean`): defaults to `true`.
    * `flushBeforeScoring` (`boolean`): flushes task spans before scorers run.
  </Accordion>

  <Accordion title="options">
    An optional object of run settings:

    * **`noSendLogs`** (`boolean`): run locally and build a local summary without sending logs.
    * **`onStart`** (`(summary) => void`): called when the experiment starts.
    * **`stream`** (`(event) => void`): receives progress events.
    * **`parent`** (`string`): exported parent span for nested or distributed eval logging.
    * **`parameters`** (`Record<string, unknown>`): runtime parameter values.
    * **`returnResults`** (`boolean`): defaults to `true`. Set `false` for large evals to keep only aggregate summary data.
    * **`enableCache`** (`boolean`): defaults to `true`. Controls the local span cache used by scorers.
  </Accordion>
</AccordionGroup>

### `buildLocalSummary()`

Builds an `ExperimentSummary` from local eval results. The SDK uses this when an eval runs without a remote experiment, such as `Eval(..., { noSendLogs: true })`.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { buildLocalSummary } from "braintrust";

const summary = buildLocalSummary(evaluatorDef, results);
```

Returns: `ExperimentSummary`

Parameters:

* **`evaluatorDef`** (`EvaluatorDef`, required): your `Eval()` evaluator object plus the SDK-populated `projectName` and `evalName` fields.

  ```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  type EvaluatorDef = Evaluator & {
    projectName: string;
    evalName: string;
  };
  ```
* **`results`** (`EvalResult[]`, required): the row-level eval results to summarize.
* **`precomputedScores`** (`ScoreAccumulator`, optional): score totals and counts to use instead of deriving them from `results`. Pass it when you ran with `returnResults: false` (so per-row results aren't available) or you've already accumulated scores yourself. It maps each score's name to a running `total` and `count`:

  ```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  type ScoreAccumulator = Record<
    string,
    {
      total: number;
      count: number;
    }
  >;
  ```

Computes average scores from the results or the precomputed accumulator.

### `reportFailures()`

Prints failing eval rows to the console in the same format the default reporter uses. Use it when you run evals with a custom reporter but still want readable failure output.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { reportFailures } from "braintrust";

reportFailures(evaluatorDef, failingResults, {
  verbose: true,
  jsonl: false,
});
```

Returns: `void`

Parameters:

* **`evaluatorDef`** (`EvaluatorDef`, required): evaluator metadata used in error output.
* **`failingResults`** (`EvalResult[]`, required): results whose `error` field is set.
* **`options`** (`{ verbose: boolean; jsonl: boolean }`, required): `verbose` prints full errors. `jsonl` emits a JSON line containing evaluator name and error strings.

### `BaseExperiment()`

Uses a previous experiment as evaluation data. The previous experiment's output
becomes the current eval's expected value.  This is useful if you want to test,
for example, a new prompt for regressions.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { BaseExperiment, Eval } from "braintrust";

await Eval("My Project", {
  data: BaseExperiment({ name: "retrieval-baseline" }),
  task: async (input) => await answerQuestion(input),
  scores: [qualityScorer],
});
```

Returns: `BaseExperiment`

If `name` is provided, the SDK uses that experiment both as the comparison base for the new experiment and as the read-only dataset for eval rows.

If `name` is omitted, the SDK derives the base experiment from git metadata:

1. During experiment registration, the SDK sends `ancestor_commits` when no explicit `baseExperimentId` or `baseExperimentName` is set.
2. `ancestor_commits` contains up to 1,000 commit hashes from the current branch, starting at the merge base with the default remote base branch and ending at `HEAD`. For a clean working tree, the merge base is computed from `HEAD^`; for a dirty working tree, it is computed from `HEAD`.
3. Braintrust chooses the newest experiment in the same project on the nearest commit in that ancestor list that you can read.
4. The chosen experiment ID is stored as the new experiment's base experiment and then opened as dataset data for `BaseExperiment()`.

The SDK chooses the default remote base branch from the local `main`, `master`, or `develop` branch when exactly one exists, otherwise from the remote HEAD branch, and falls back to `main` if the remote HEAD cannot be read. If no experiment matches the ancestor commits, later base lookup can use a project-configured baseline experiment or the nearest earlier experiment in the same project by creation time. If no base experiment can be found, `BaseExperiment()` fails.

## Experiments

An experiment is a single evaluation run logged to a project. Use these APIs when you want to create an experiment and log rows yourself, instead of letting `Eval()` manage one for you.

### `initExperiment()`

Initializes an experiment in a project. Use it when you want to log experiment rows manually.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { initExperiment } from "braintrust";

const experiment = initExperiment({
  project: "Support bot",
  experiment: "retrieval-baseline",
  description: "Manual retrieval evaluation",
});

experiment.log({
  input: "How do I reset my password?",
  output: "Use the account recovery flow.",
  expected: "Follow the password reset instructions.",
  scores: { correctness: 0.8 },
});

const summary = await experiment.summarize();
await experiment.flush();
```

Returns: `Experiment`

Options:

* **`project`** (`string`): project name. Required unless `projectId` is set.
* **`projectId`** (`string`): project ID. Takes precedence over `project`.
* **`experiment`** (`string`): experiment name. Generated automatically if omitted.
* **`description`** (`string`): experiment description.
* **`dataset`** (`string | Dataset`): dataset to associate with the experiment.
* **`metadata`** (`Record<string, unknown>`): experiment metadata.
* **`tags`** (`string[]`): experiment tags.
* **`baseExperiment`** (`string`): base experiment name for comparison.
* **`baseExperimentId`** (`string`): base experiment ID. Takes precedence over `baseExperiment`.
* **`update`** (`boolean`): continue logging to an existing experiment with the same name.
* **`open`** (`boolean`): open an existing experiment in read-only mode.
* **`isPublic`** (`boolean`): whether links are public. Defaults to private.
* **`setCurrent`** (`boolean`): defaults to `true`. Controls whether global `log()` uses this experiment.

## Datasets

A dataset is a versioned collection of cases you manage in Braintrust and reuse across experiments and evals. Use `initDataset()` to create a dataset or open an existing one.

### `initDataset()`

Creates or opens a dataset in a project.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { initDataset } from "braintrust";

const dataset = initDataset({
  project: "Support bot",
  dataset: "Password reset questions",
});

dataset.insert({
  input: "How do I reset my password?",
  expected: "Use the password reset flow.",
  metadata: { topic: "account" },
});

await dataset.flush();
```

Returns: `Dataset`

Options:

* **`project`** (`string`): project name. Required unless `projectId` is set.
* **`projectId`** (`string`): project ID. Takes precedence over `project`.
* **`dataset`** (`string`): dataset name. Generated automatically if omitted.
* **`description`** (`string`): dataset description.
* **`metadata`** (`Record<string, unknown>`): dataset metadata.
* **`version`** (`string`): when fetching or iterating rows, read the dataset at this transaction ID. Takes precedence over `snapshotName` and `environment`.
* **`snapshotName`** (`string`): when fetching or iterating rows, read the dataset version captured by this named snapshot. Takes precedence over `environment`.
* **`environment`** (`string`): when fetching or iterating rows, read the dataset version assigned to this environment slug.
* **`useOutput`** (`boolean`): deprecated legacy mode that maps `expected` to `output` when fetching rows.

`version`, `snapshotName`, and `environment` select the committed dataset version used for reads. Calls such as `insert()` and `flush()` still write new rows to the dataset.

## Prompts and functions

In Braintrust, functions are units of logic you define and version in the UI, then load or invoke from your code. A prompt is a function whose job is to call a model with a templated set of messages. Other functions include scorers, tools, and code you deploy. Load and render a saved prompt with `loadPrompt()`, or call a function directly with `invoke()`.

### `loadPrompt()`

Loads a saved prompt from Braintrust. Use the returned `Prompt` object's `build()` method to render model parameters, messages, tools, or completion text with runtime variables.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { loadPrompt } from "braintrust";

const prompt = await loadPrompt({
  projectName: "Support bot",
  slug: "answer-question",
  environment: "production",
  defaults: {
    model: "gpt-5-mini",
  },
});

const built = prompt.build({
  question: "How do I reset my password?",
  tone: "concise",
}, {
  strict: true,
});
```

Returns: `Promise<Prompt>`

Options:

* **`projectName`** (`string`): project containing the prompt. Required unless `projectId` or `id` is set.
* **`projectId`** (`string`): project ID. Takes precedence over `projectName`.
* **`slug`** (`string`): prompt slug. Required unless `id` is set.
* **`id`** (`string`): prompt ID. Takes precedence over project and slug.
* **`version`** (`string`): prompt version. Takes precedence over `environment`.
* **`environment`** (`string`): environment assignment such as `production` or `staging`.
* **`defaults`** (`Record<string, unknown>`): default model, parameter, message, or completion fields used by `build()`. Saved prompt fields override these defaults.
* **`noTrace`** (`boolean`): if `true`, built prompt metadata is not attached to traces.

If neither `version` nor `environment` is set, the SDK loads the latest version and can fall back to the local prompt cache.

`build(args, options?)` renders the prompt. Parameters:

* **`args`** (`object`, required): the template variables. Object keys become variables, and the whole value is available as `input`. Rendered with Mustache by default, with non-string values JSON-stringified.
* **`options`** (optional):
  * **`flavor`** (`"chat" | "completion"`): defaults to `"chat"`. Use `"completion"` for completion prompts.
  * **`messages`** (`Message[]`): additional chat messages to append. If the prompt already has a system message, extra system messages are omitted.
  * **`strict`** (`boolean`): validates that referenced template variables are present.
  * **`templateFormat`** (`"mustache" | "nunjucks" | "none"`): overrides the prompt's template format. Nunjucks requires `@braintrust/template-nunjucks`. `none` leaves template strings unrendered.

Use `buildWithAttachments()` instead of `build()` when prompt variables contain Braintrust attachment references that need to be resolved before rendering.

### `getPromptVersions()`

Lists the version IDs for a saved prompt, one for each change recorded when the prompt was created or updated. Use it to find an earlier version to load or pin.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { getPromptVersions, loadPrompt } from "braintrust";

const versions = await getPromptVersions(projectId, promptId);
const prompt = await loadPrompt({
  id: promptId,
  version: versions[0],
});
```

Returns: a promise of the prompt's version IDs

Parameters:

* **`projectId`** (`string`, required): the project's ID.
* **`promptId`** (`string`, required): the prompt's ID.

The returned strings can be passed to `loadPrompt({ version })` to load a specific historical prompt version.

`getPromptVersions()` logs in internally with the global SDK state. If `BRAINTRUST_API_KEY` is set, or if a previous SDK call already logged in, you can call it directly. Call `login()` first when you need to provide `apiKey`, `orgName`, `appUrl`, or a custom `fetch`, because `getPromptVersions()` does not accept those options itself.

### `loadParameters()`

Loads a set of parameters you've saved in Braintrust, so an eval or application can run with configuration managed in the UI instead of hardcoded values. The example below passes them to `Eval()`.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { Eval, loadParameters } from "braintrust";

const parameters = await loadParameters({
  projectName: "Support bot",
  slug: "retrieval-config",
  environment: "production",
});

await Eval("Support bot", {
  data: [{ input: "ping", expected: "pong" }],
  parameters,
  task: async (input, hooks) => {
    return await answerQuestion(input, hooks.parameters);
  },
  scores: [({ output, expected }) => (output === expected ? 1 : 0)],
});
```

Returns: a promise of the resolved parameters

Use `projectName` with `slug`, or `projectId` with `slug`, or `id`. `version` takes precedence over `environment`.

### `invoke()`

Invokes a Braintrust function, prompt, scorer, or tool.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { invoke } from "braintrust";
import { z } from "zod";

const result = await invoke({
  projectName: "Support bot",
  slug: "answer-question",
  input: { question: "How do I reset my password?" },
  schema: z.object({ answer: z.string() }),
});
```

Returns: the function's output (JSON), or a `BraintrustStream` when `stream: true`

Identification options. Provide one of these: a `function_id`, a `projectName` or `projectId` plus a `slug`, or a `globalFunction`.

* **`function_id`** (`string`): function ID.
* **`projectName`** (`string`): project containing the function.
* **`projectId`** (`string`): ID of the project containing the function. Can be used instead of `projectName`.
* **`slug`** (`string`): function slug.
* **`globalFunction`** (`string`): global function name. See [Global Functions](/deploy/functions#global-function-name).
* **`functionType`** (`FunctionType`): global function type. Defaults to scorer for global functions.
* **`version`** (`string`): function version.

Execution options:

* **`input`** (`Input`, required): logged as the span input.
* **`messages`** (`Message[]`): additional OpenAI-style messages for LLM functions.
* **`metadata`** (`Record<string, unknown>`): logged as span metadata and available to the function.
* **`tags`** (`string[]`): logged as span tags.
* **`parent`** (`Exportable | string`): parent span, logger, experiment, or exported parent string.
* **`stream`** (`boolean`): returns a `BraintrustStream` when `true`.
* **`mode`** (`StreamingMode`): function streaming mode.
* **`strict`** (`boolean`): throws when prompt variable names do not match input keys.
* **`schema`** (`z.ZodSchema`): validates non-streaming output and returns the typed value.

### `initFunction()`

Creates a callable wrapper around a Braintrust function for use as an eval task or scorer.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { Eval, initFunction } from "braintrust";

const answerQuestion = initFunction({
  projectName: "Support bot",
  slug: "answer-question",
});

await Eval("Support bot", {
  data: [{ input: { question: "ping" }, expected: "pong" }],
  task: answerQuestion,
  scores: [qualityScorer],
});
```

Returns: a function `(input) => Promise<output>`

Options:

* **`projectName`** (`string`, required): project containing the function.
* **`slug`** (`string`, required): function slug.
* **`version`** (`string`): function version.
* **`state`** (`BraintrustState`): SDK state override.

## Attachments

Attachments let you log files or large payloads without storing the full bytes inline in the span. This is useful to circumvent the 6 MB request limit to Braintrust. When you trace AI calls, Braintrust automatically converts base64 attachments in provider messages into uploaded attachments, so you rarely need the APIs below for instrumented calls. Reach for them when you're attaching binary content to a span yourself.

### `Attachment`

Uploads local or in-memory file data and replaces it with an `AttachmentReference` during logging.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { Attachment, initLogger } from "braintrust";

const logger = initLogger({ projectName: "Support bot" });

logger.log({
  input: {
    document: new Attachment({
      data: "./fixtures/account-reset.pdf",
      filename: "account-reset.pdf",
      contentType: "application/pdf",
    }),
  },
});
```

Constructor fields are `data` (`string`, `Blob`, or `ArrayBuffer`), `filename`, `contentType`, and optional `state`.

### `ExternalAttachment`

References a file that already exists in an external object store.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { ExternalAttachment, initLogger } from "braintrust";

const logger = initLogger({ projectName: "Support bot" });

logger.log({
  input: {
    document: new ExternalAttachment({
      url: "s3://support-docs/account-reset.pdf",
      filename: "account-reset.pdf",
      contentType: "application/pdf",
    }),
  },
});
```

`ExternalAttachment.upload()` is a no-op because the file already exists externally.

### `JSONAttachment`

Serializes JSON and uploads it as an `application/json` attachment. Use it for large JSON objects that should be viewable but do not need to be indexed for search.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { JSONAttachment, initLogger } from "braintrust";

const logger = initLogger({ projectName: "Support bot" });

logger.log({
  input: {
    transcript: new JSONAttachment(conversation, {
      filename: "conversation.json",
      pretty: true,
    }),
  },
});
```

Options are `filename`, `pretty`, and optional `state`.

### `ReadonlyAttachment`

Reads an already-uploaded attachment from a dataset, experiment, or raw `AttachmentReference`.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { ReadonlyAttachment, type AttachmentReference } from "braintrust";

async function readAttachment(reference: AttachmentReference) {
  const attachment = new ReadonlyAttachment(reference);
  const blob = await attachment.data();
  const status = await attachment.status();

  return { blob, status };
}
```

Useful methods:

* **`data()`** → `Promise<Blob>`: downloads the attachment as a `Blob`.
* **`asBase64Url()`** → `Promise<string>`: returns a `data:<content-type>;base64,...` URL for prompts.
* **`metadata()`** → `Promise<AttachmentMetadata>`: fetches download URL and upload status.
* **`status()`** → `Promise<AttachmentStatus>`: fetches current upload status.

## Wrappers

Each wrapper returns the same client or module you pass in, with its methods traced, so the return type always matches the argument type. If Braintrust logging is not configured, wrapper calls are effectively no-ops.

Initialize a logger before wrapped calls:

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { initLogger } from "braintrust";

initLogger({ projectName: "Support bot" });
```

### OpenAI

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import OpenAI from "openai";
import { wrapOpenAI } from "braintrust";

const client = wrapOpenAI(new OpenAI());

await client.responses.create({
  model: "gpt-5-mini",
  input: "Answer in one sentence.",
});
```

* **`wrapOpenAI(openai)`**: wraps OpenAI SDK clients and supports current v4, v5, and v6 style clients.
* **`wrapOpenAIv4(openai)`**: lower-level OpenAI wrapper used by `wrapOpenAI()`. Use `wrapOpenAI()` unless you know you need this compatibility entry point.

Traced OpenAI surfaces include chat completions, beta chat helpers, responses, embeddings, and moderations where present on the client.

### Anthropic

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import Anthropic from "@anthropic-ai/sdk";
import { wrapAnthropic } from "braintrust";

const anthropic = wrapAnthropic(new Anthropic());
```

`wrapAnthropic()` traces `messages.create`, beta `messages.create`, and beta tool runner calls when available.

### Vercel AI SDK

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import * as ai from "ai";
import { openai } from "@ai-sdk/openai";
import { wrapAISDK } from "braintrust";

const { generateText, streamText, Agent } = wrapAISDK(ai);

await generateText({
  model: openai("gpt-5-mini"),
  prompt: "Summarize the ticket.",
});
```

* **`wrapAISDK(ai, options?)`**: wraps AI SDK namespace functions such as `generateText`, `streamText`, `generateObject`, `streamObject`, `embed`, `embedMany`, `rerank`, and agent constructors.
* **`wrapAgentClass(AgentClass, options?)`**: wraps an AI SDK agent class directly.

`wrapAISDK()` accepts `denyOutputPaths?: string[]` to omit selected output paths from logged output.

### Agent SDKs

* **`wrapClaudeAgentSDK(sdk)`**: wraps the Claude Agent SDK module, including `query`, `tool`, and `createSdkMcpServer` integration points.
* **`wrapOpenRouterAgent(agent)`**: wraps `@openrouter/agent` clients.
* **`wrapGoogleADK(adkModule)`**: wraps Google ADK modules, including runner, agent, and tool execution methods.

### Model provider SDKs

* **`wrapGoogleGenAI()`**: wraps the entire `@google/genai` package export (similar to the AI SDK wrapper). Traces `models.generateContent`, `models.generateContentStream`, and `models.embedContent`.
* **`wrapOpenRouter()`**: pass an OpenRouter client. Traces chat, responses, embeddings, rerank, and `callModel()` where available.
* **`wrapMistral()`**: pass a Mistral client. Traces chat, FIM (fill-in-the-middle), agents, and embeddings.
* **`wrapCohere()`**: pass a Cohere client. Traces chat, chat streaming, embed, and rerank.
* **`wrapGroq()`**: pass a Groq client. Traces chat completions and embeddings.
* **`wrapHuggingFace()`**: pass a Hugging Face Inference module or client. Traces chat completion, text generation, streaming variants, and feature extraction.

## Testing

Run Braintrust evaluations inside your test runner, so eval cases live alongside your unit tests and your suite fails when scores regress.

### `wrapVitest()`

Wraps Vitest test APIs so tests can create Braintrust experiment rows with inputs, expected values, scorer results, and pass/fail status.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import * as vitest from "vitest";
import { wrapVitest } from "braintrust";

const bt = wrapVitest(vitest, {
  projectName: "Support bot",
});

bt.describe("retrieval", () => {
  bt.test(
    "answers password reset",
    {
      input: "How do I reset my password?",
      expected: "Use the password reset flow.",
      scorers: [qualityScorer],
    },
    async ({ input }) => {
      const output = await answerQuestion(input);
      bt.logOutputs(output);
      bt.expect(output).toBeTruthy();
      return output;
    },
  );
});
```

Returns: the wrapped Vitest APIs (`test`, `it`, `expect`, `describe`, lifecycle hooks, and Braintrust helpers)

`wrapVitest()` returns wrapped `test`, `it`, `expect`, `describe`, lifecycle hooks, `logOutputs`, `logFeedback`, `getCurrentSpan`, and `flushExperiment()`.

### `initNodeTestSuite()`

Creates a Braintrust-backed suite for Node's built-in `node:test` runner.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { after, describe, test } from "node:test";
import assert from "node:assert/strict";
import { initNodeTestSuite } from "braintrust";

describe("retrieval", () => {
  const suite = initNodeTestSuite({
    projectName: "Support bot",
    after,
  });

  test(
    "answers password reset",
    suite.eval(
      {
        input: "How do I reset my password?",
        expected: "Use the password reset flow.",
        scorers: [qualityScorer],
      },
      async ({ input }) => {
        const output = await answerQuestion(input);
        assert.ok(output);
        return output;
      },
    ),
  );
});
```

Returns: a suite with `eval()` and `flush()`

The suite exposes `eval(config, fn)` and `flush()`. If you pass Node's `after` hook, `flush()` is registered automatically.

## Configuration

Most configuration is passed per call, such as `apiKey` and `projectName` on `initLogger()`. You can also authenticate once with `login()`, or set credentials and deployment options through environment variables.

### `login()`

Authenticates the SDK to Braintrust and stores the credentials in its global state, where every later SDK call reads them. Most SDK APIs log in automatically when they first need credentials.

```ts theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import { login, loadPrompt } from "braintrust";

await login({
  apiKey: "bt_org_...", // Replace with your API key.
  orgName: "example-org",
});

const prompt = await loadPrompt({
  projectName: "Support bot",
  slug: "answer-question",
});
```

Returns: `Promise<BraintrustState>`

Options (all optional):

* **`apiKey`** (`string`): API key. Defaults to `BRAINTRUST_API_KEY`.
* **`appUrl`** (`string`): Braintrust app URL. Defaults to `https://www.braintrust.dev`.
* **`orgName`** (`string`): organization name, useful when credentials can access multiple orgs.
* **`fetch`** (`typeof fetch`): custom fetch implementation for SDK requests.
* **`forceLogin`** (`boolean`): log in again even if the SDK already has credentials.
* **`noExitFlush`** (`boolean`): disables the process exit handler that flushes pending writes.
* **`onFlushError`** (`(error) => void`): handles errors from the background flusher.
* **`disableSpanCache`** (`boolean`): disables the local span cache used by eval scorers.
* **`debugLogLevel`** (`"error" | "warn" | "info" | "debug" | false`): enables SDK troubleshooting output.

In most cases you don't need to call `login()` yourself. When `BRAINTRUST_API_KEY` is set, APIs like `initLogger()`, `initExperiment()`, `initDataset()`, `loadPrompt()`, and `loadParameters()` authenticate automatically using it. Call `login()` yourself when you want to authenticate with something other than that environment variable, such as a specific API key, organization, or custom `fetch`. Logging in stores those credentials in the SDK's global state, so every later SDK call uses them, including APIs like `getPromptVersions()` that take no auth options of their own.

### Environment variables

* **`BRAINTRUST_API_KEY`**: API key used to authenticate. Same as the `apiKey` option.
* **`BRAINTRUST_API_URL`**: Braintrust API URL. Set this for self-hosted or data plane deployments.
* **`BRAINTRUST_APP_URL`**: Braintrust app URL, used for permalinks. Defaults to `https://www.braintrust.dev`. Same as the `appUrl` option.
* **`BRAINTRUST_ORG_NAME`**: organization name, useful when credentials can access multiple orgs. Same as the `orgName` option.
* **`BRAINTRUST_DISABLE_INSTRUMENTATION`**: comma-separated list of integrations to skip during auto-instrumentation, for example `openai,anthropic`.
* **`BRAINTRUST_DEBUG_LOG_LEVEL`**: SDK troubleshooting output: `error`, `warn`, `info`, or `debug`. Unset stays silent. Same as the `debugLogLevel` option.
