Tracing
Tracing records what your application does as spans you can inspect in Braintrust. The recommended way to capture AI calls is auto-instrumentation: callinitLogger() once at startup and run your app with the Braintrust import or build hook, and supported AI clients are traced with no per-call changes (see Install and instrument). Or, to learn how to instrument a specific provider client by hand, see Wrappers.
The APIs below configure tracing and let you trace your own application code.
initLogger()
Creates the logger that sends your traces to Braintrust, your starting point for tracing. Call it once on startup: by default it becomes the current logger that traced(), startSpan(), and auto-instrumentation log through, and auto-instrumentation requires it.
Logger
Options (all optional):
projectName(string): project name for logs. If omitted, logs go to the global project.projectId(string): project ID. Takes precedence overprojectName.apiKey(string): API key. Defaults toBRAINTRUST_API_KEY.appUrl(string): Braintrust app URL. Defaults tohttps://www.braintrust.dev.orgName(string): organization name, useful when credentials can access multiple orgs.asyncFlush(boolean): defaults totrue. Setfalseto make logger calls likelogger.log()awaitable.setCurrent(boolean): defaults totrue. Controls whethercurrentLogger()returns this logger and what logger auto-instrumentation logs through.forceLogin(boolean): log in again even if the SDK already has credentials.fetch(typeof fetch): custom fetch implementation for SDK requests. Takes a function that follows the same API as nativefetch.debugLogLevel("error" | "warn" | "info" | "debug" | false): enables SDK troubleshooting output.
log(event)→string: logs a span and returns its row ID. ReturnsPromise<string>when the logger usesasyncFlush: false.traced(callback, args?)→ the callback’s result: creates a root span under the logger and ends it when the callback finishes. Spans started within the callback are nested under thetracedspan.startSpan(args?)→Span: starts a root span that you end manually.logFeedback(event)→void: adds feedback scores, expected values, tags, or comments to an existing row.updateSpan(event)→void: updates an already-written span by ID. Flush the original span before updating it.export()→Promise<string>: a serialized parent string for distributed tracing.flush()→Promise<void>: sends pending rows to Braintrust, resolving when they’re all flushed.
currentLogger()
The logger most recently installed by initLogger({ setCurrent: true }), or undefined if no current logger exists.
Logger | undefined
traced()
Traces a piece of your own code. Pass a callback and it runs inside a span: the span is current for the duration of the callback, so nested spans and traced LLM calls attach to it, errors thrown inside are logged to the span, and the span ends automatically when the callback returns.
callback((span) => R, required): the function to run inside the span. Its return value is returned.args(optional): span options.name(string): span name shown in Braintrust.type(SpanType): span type, such asfunction,task,tool,llm,score, oreval.spanAttributes(Record<PropertyKey, unknown>): extra span attributes.startTime(number): start time as a Unix timestamp in seconds.parent(string): exported parent span string.spanId(string): explicit span ID.setCurrent(boolean): defaults totrue. Controls whethercurrentSpan()returns this span inside the callback.
startSpan()
Creates a span you manage by hand: you start it, log to it, and call end() yourself when the work finishes. Reach for it when the work you’re tracing doesn’t fit inside a single callback, such as when a span starts and ends in different places or you’re integrating at a lower level. Unlike traced(), the span is not made current, so other spans and traced LLM calls won’t automatically nest under it.
Span
currentSpan()
Gives you the span you’re currently inside, so you can add data to it without holding a direct reference, for example from a helper function called within a traced() callback. If no span is active, returns a no-op span: an object with the same interface that silently discards anything you log, so calls like currentSpan().log(...) are always safe.
Span
withCurrent()
Makes an existing span the current span for the duration of a callback. A span from startSpan() isn’t current on its own, so spans and traced LLM calls created afterward won’t nest under it. Wrap that work in withCurrent(span, callback) and it nests under span, with currentSpan() returning it inside the callback.
span(Span, required): the span to make current inside the callback.callback((span) => R, required): the work to run withspancurrent. Its return value is returned.
withParent()
Attaches the spans created inside its callback to a parent that lives somewhere else, such as another process, service, or job. Get the parent’s location by calling export() on a span or logger, pass that string to withParent(), and the work inside nests under the original trace. This is how you trace across process boundaries (distributed tracing).
parent(string, required): an exported parent string, from a span’s or logger’sexport().callback(() => R, required): the work to run under that parent. Its return value is returned.
logError()
Logs an error and stack trace to a span’s error field.
void
Parameters:
span(Span, required): the span to attach the error to.error(unknown, required): the caught error, whose message and stack are logged.
traced() are logged automatically, so use logError() for errors you catch yourself.
permalink()
Turns an exported span string into a shareable Braintrust UI URL, so you can link straight to a specific trace from your own logs, dashboards, or alerts. Pass the string returned by a span’s export().
Promise<string>
Links can be generated before flushing, but they become viewable only after the span and its root have been flushed and ingested. If you have a span object, span.permalink() is usually simpler.
flush()
Sends any buffered log rows to Braintrust and returns a promise that resolves once they’re all sent. Call it before a script or short-lived process exits so pending events aren’t lost.
Promise<void>
setMaskingFunction()
Installs a global masking function that runs over your logged data before it leaves your process, so you can redact sensitive values like PII or secrets before they ever reach Braintrust. Set it to null to remove a previously configured masking function.
void
Masking is applied at flush time for fields such as input, output, expected, metadata, context, scores, and metrics.
Evaluations
An evaluation runs your task over a set of cases, scores each output, and logs the results to an experiment, which is how you measure quality and catch regressions as you change prompts or models.Eval() is the main entry point. The other APIs here summarize and report results.
Eval()
Runs an evaluation from your data, a task, and scorers or classifiers: it runs the task over every case, scores the outputs, logs each row to an experiment, and returns a result with a summary you can compare across runs.
Promise<EvalResultWithSummary>
Eval(project, evaluator, options?) takes three arguments.
project
project
The project name or ID (a
string) to log the experiment to.evaluator
evaluator
Defines the evaluation: the data to run against, the task to test, and how to score the results. Its core fields:
-
data(EvalData, required): the evaluation cases. Provide an array of cases, a function or promise that returns an array, anAsyncIterable/AsyncGenerator, orBaseExperiment(...). Each case has:input(required): the value passed to the task.expected: the expected output for this case, if any.metadata/tags: per-case metadata and tags.trialCount: how many times to run the task for this case.
-
task((input, hooks?) => output, required): the function under test. It receives two arguments:input: the current case’s input value.hooks(optional): a context object the framework passes in for the current evaluation row. Use its fields to read parameters, log to the task’s span, or adjust the row:span: the task’s span. Log extra data or child spans to it.parameters: the validated runtime parameters.metadata/tags: mutate to change the current row’s metadata or tags.expected: the expected output for the current case, if it provided one.trialIndex: zero-based trial index, useful whentrialCountis greater than 1.reportProgress: reports task progress for long-running evals.
-
scores(EvalScorer[]): scorer functions, required unlessclassifiersis set. Each receives the case fields plusoutputand an optionaltrace, and returns a number, aScore, an array of scores, ornullto skip the row. A bare number is named after the function (orscorer_<index>); return aScorefor a custom name or metadata. -
classifiers(EvalClassifier[]): like scorers, but return aClassificationand write to theclassificationsfield instead ofscores(required unlessscoresis set).idis the machine-readable outcome,labelis the display label (defaults toid), andnameis the key (defaults to the function name, orclassifier_<index>). -
parameters(EvalParameters | loadParameters()): named, typed settings that make the eval configurable, so you can re-run it with different values (a temperature, a model, a prompt) without editing code. Declare them with defaults here, and the task reads the chosen values fromhooks.parameters. Pass runtime values in theoptionsargument (validated against your schema), or load a saved set withloadParameters(), whose defaults merge with any runtime overrides. Supports Zod values, model parameters (exposed as strings), and prompt parameters (exposed asPromptobjects).
experimentName(string): experiment name.trialCount(number): number of task runs per input.maxConcurrency(number): maximum concurrent task/scorer work.timeout(number): evaluation timeout in milliseconds.metadata(Record<string, unknown>): experiment metadata.tags(string[]): experiment tags.baseExperimentName(string): base experiment name for comparison.baseExperimentId(string): base experiment ID, taking precedence overbaseExperimentName.summarizeScores(boolean): defaults totrue.flushBeforeScoring(boolean): flushes task spans before scorers run.
options
options
An optional object of run settings:
noSendLogs(boolean): run locally and build a local summary without sending logs.onStart((summary) => void): called when the experiment starts.stream((event) => void): receives progress events.parent(string): exported parent span for nested or distributed eval logging.parameters(Record<string, unknown>): runtime parameter values.returnResults(boolean): defaults totrue. Setfalsefor large evals to keep only aggregate summary data.enableCache(boolean): defaults totrue. Controls the local span cache used by scorers.
buildLocalSummary()
Builds an ExperimentSummary from local eval results. The SDK uses this when an eval runs without a remote experiment, such as Eval(..., { noSendLogs: true }).
ExperimentSummary
Parameters:
-
evaluatorDef(EvaluatorDef, required): yourEval()evaluator object plus the SDK-populatedprojectNameandevalNamefields. -
results(EvalResult[], required): the row-level eval results to summarize. -
precomputedScores(ScoreAccumulator, optional): score totals and counts to use instead of deriving them fromresults. Pass it when you ran withreturnResults: false(so per-row results aren’t available) or you’ve already accumulated scores yourself. It maps each score’s name to a runningtotalandcount:
reportFailures()
Prints failing eval rows to the console in the same format the default reporter uses. Use it when you run evals with a custom reporter but still want readable failure output.
void
Parameters:
evaluatorDef(EvaluatorDef, required): evaluator metadata used in error output.failingResults(EvalResult[], required): results whoseerrorfield is set.options({ verbose: boolean; jsonl: boolean }, required):verboseprints full errors.jsonlemits a JSON line containing evaluator name and error strings.
BaseExperiment()
Uses a previous experiment as evaluation data. The previous experiment’s output
becomes the current eval’s expected value. This is useful if you want to test,
for example, a new prompt for regressions.
BaseExperiment
If name is provided, the SDK uses that experiment both as the comparison base for the new experiment and as the read-only dataset for eval rows.
If name is omitted, the SDK derives the base experiment from git metadata:
- During experiment registration, the SDK sends
ancestor_commitswhen no explicitbaseExperimentIdorbaseExperimentNameis set. ancestor_commitscontains up to 1,000 commit hashes from the current branch, starting at the merge base with the default remote base branch and ending atHEAD. For a clean working tree, the merge base is computed fromHEAD^; for a dirty working tree, it is computed fromHEAD.- Braintrust chooses the newest experiment in the same project on the nearest commit in that ancestor list that you can read.
- The chosen experiment ID is stored as the new experiment’s base experiment and then opened as dataset data for
BaseExperiment().
main, master, or develop branch when exactly one exists, otherwise from the remote HEAD branch, and falls back to main if the remote HEAD cannot be read. If no experiment matches the ancestor commits, later base lookup can use a project-configured baseline experiment or the nearest earlier experiment in the same project by creation time. If no base experiment can be found, BaseExperiment() fails.
Experiments
An experiment is a single evaluation run logged to a project. Use these APIs when you want to create an experiment and log rows yourself, instead of lettingEval() manage one for you.
initExperiment()
Initializes an experiment in a project. Use it when you want to log experiment rows manually.
Experiment
Options:
project(string): project name. Required unlessprojectIdis set.projectId(string): project ID. Takes precedence overproject.experiment(string): experiment name. Generated automatically if omitted.description(string): experiment description.dataset(string | Dataset): dataset to associate with the experiment.metadata(Record<string, unknown>): experiment metadata.tags(string[]): experiment tags.baseExperiment(string): base experiment name for comparison.baseExperimentId(string): base experiment ID. Takes precedence overbaseExperiment.update(boolean): continue logging to an existing experiment with the same name.open(boolean): open an existing experiment in read-only mode.isPublic(boolean): whether links are public. Defaults to private.setCurrent(boolean): defaults totrue. Controls whether globallog()uses this experiment.
Datasets
A dataset is a versioned collection of cases you manage in Braintrust and reuse across experiments and evals. UseinitDataset() to create a dataset or open an existing one.
initDataset()
Creates or opens a dataset in a project.
Dataset
Options:
project(string): project name. Required unlessprojectIdis set.projectId(string): project ID. Takes precedence overproject.dataset(string): dataset name. Generated automatically if omitted.description(string): dataset description.metadata(Record<string, unknown>): dataset metadata.version(string): when fetching or iterating rows, read the dataset at this transaction ID. Takes precedence oversnapshotNameandenvironment.snapshotName(string): when fetching or iterating rows, read the dataset version captured by this named snapshot. Takes precedence overenvironment.environment(string): when fetching or iterating rows, read the dataset version assigned to this environment slug.useOutput(boolean): deprecated legacy mode that mapsexpectedtooutputwhen fetching rows.
version, snapshotName, and environment select the committed dataset version used for reads. Calls such as insert() and flush() still write new rows to the dataset.
Prompts and functions
In Braintrust, functions are units of logic you define and version in the UI, then load or invoke from your code. A prompt is a function whose job is to call a model with a templated set of messages. Other functions include scorers, tools, and code you deploy. Load and render a saved prompt withloadPrompt(), or call a function directly with invoke().
loadPrompt()
Loads a saved prompt from Braintrust. Use the returned Prompt object’s build() method to render model parameters, messages, tools, or completion text with runtime variables.
Promise<Prompt>
Options:
projectName(string): project containing the prompt. Required unlessprojectIdoridis set.projectId(string): project ID. Takes precedence overprojectName.slug(string): prompt slug. Required unlessidis set.id(string): prompt ID. Takes precedence over project and slug.version(string): prompt version. Takes precedence overenvironment.environment(string): environment assignment such asproductionorstaging.defaults(Record<string, unknown>): default model, parameter, message, or completion fields used bybuild(). Saved prompt fields override these defaults.noTrace(boolean): iftrue, built prompt metadata is not attached to traces.
version nor environment is set, the SDK loads the latest version and can fall back to the local prompt cache.
build(args, options?) renders the prompt. Parameters:
args(object, required): the template variables. Object keys become variables, and the whole value is available asinput. Rendered with Mustache by default, with non-string values JSON-stringified.options(optional):flavor("chat" | "completion"): defaults to"chat". Use"completion"for completion prompts.messages(Message[]): additional chat messages to append. If the prompt already has a system message, extra system messages are omitted.strict(boolean): validates that referenced template variables are present.templateFormat("mustache" | "nunjucks" | "none"): overrides the prompt’s template format. Nunjucks requires@braintrust/template-nunjucks.noneleaves template strings unrendered.
buildWithAttachments() instead of build() when prompt variables contain Braintrust attachment references that need to be resolved before rendering.
getPromptVersions()
Lists the version IDs for a saved prompt, one for each change recorded when the prompt was created or updated. Use it to find an earlier version to load or pin.
projectId(string, required): the project’s ID.promptId(string, required): the prompt’s ID.
loadPrompt({ version }) to load a specific historical prompt version.
getPromptVersions() logs in internally with the global SDK state. If BRAINTRUST_API_KEY is set, or if a previous SDK call already logged in, you can call it directly. Call login() first when you need to provide apiKey, orgName, appUrl, or a custom fetch, because getPromptVersions() does not accept those options itself.
loadParameters()
Loads a set of parameters you’ve saved in Braintrust, so an eval or application can run with configuration managed in the UI instead of hardcoded values. The example below passes them to Eval().
projectName with slug, or projectId with slug, or id. version takes precedence over environment.
invoke()
Invokes a Braintrust function, prompt, scorer, or tool.
BraintrustStream when stream: true
Identification options. Provide one of these: a function_id, a projectName or projectId plus a slug, or a globalFunction.
function_id(string): function ID.projectName(string): project containing the function.projectId(string): ID of the project containing the function. Can be used instead ofprojectName.slug(string): function slug.globalFunction(string): global function name. See Global Functions.functionType(FunctionType): global function type. Defaults to scorer for global functions.version(string): function version.
input(Input, required): logged as the span input.messages(Message[]): additional OpenAI-style messages for LLM functions.metadata(Record<string, unknown>): logged as span metadata and available to the function.tags(string[]): logged as span tags.parent(Exportable | string): parent span, logger, experiment, or exported parent string.stream(boolean): returns aBraintrustStreamwhentrue.mode(StreamingMode): function streaming mode.strict(boolean): throws when prompt variable names do not match input keys.schema(z.ZodSchema): validates non-streaming output and returns the typed value.
initFunction()
Creates a callable wrapper around a Braintrust function for use as an eval task or scorer.
(input) => Promise<output>
Options:
projectName(string, required): project containing the function.slug(string, required): function slug.version(string): function version.state(BraintrustState): SDK state override.
Attachments
Attachments let you log files or large payloads without storing the full bytes inline in the span. This is useful to circumvent the 6 MB request limit to Braintrust. When you trace AI calls, Braintrust automatically converts base64 attachments in provider messages into uploaded attachments, so you rarely need the APIs below for instrumented calls. Reach for them when you’re attaching binary content to a span yourself.Attachment
Uploads local or in-memory file data and replaces it with an AttachmentReference during logging.
data (string, Blob, or ArrayBuffer), filename, contentType, and optional state.
ExternalAttachment
References a file that already exists in an external object store.
ExternalAttachment.upload() is a no-op because the file already exists externally.
JSONAttachment
Serializes JSON and uploads it as an application/json attachment. Use it for large JSON objects that should be viewable but do not need to be indexed for search.
filename, pretty, and optional state.
ReadonlyAttachment
Reads an already-uploaded attachment from a dataset, experiment, or raw AttachmentReference.
data()→Promise<Blob>: downloads the attachment as aBlob.asBase64Url()→Promise<string>: returns adata:<content-type>;base64,...URL for prompts.metadata()→Promise<AttachmentMetadata>: fetches download URL and upload status.status()→Promise<AttachmentStatus>: fetches current upload status.
Wrappers
Each wrapper returns the same client or module you pass in, with its methods traced, so the return type always matches the argument type. If Braintrust logging is not configured, wrapper calls are effectively no-ops. Initialize a logger before wrapped calls:OpenAI
wrapOpenAI(openai): wraps OpenAI SDK clients and supports current v4, v5, and v6 style clients.wrapOpenAIv4(openai): lower-level OpenAI wrapper used bywrapOpenAI(). UsewrapOpenAI()unless you know you need this compatibility entry point.
Anthropic
wrapAnthropic() traces messages.create, beta messages.create, and beta tool runner calls when available.
Vercel AI SDK
wrapAISDK(ai, options?): wraps AI SDK namespace functions such asgenerateText,streamText,generateObject,streamObject,embed,embedMany,rerank, and agent constructors.wrapAgentClass(AgentClass, options?): wraps an AI SDK agent class directly.
wrapAISDK() accepts denyOutputPaths?: string[] to omit selected output paths from logged output.
Agent SDKs
wrapClaudeAgentSDK(sdk): wraps the Claude Agent SDK module, includingquery,tool, andcreateSdkMcpServerintegration points.wrapOpenRouterAgent(agent): wraps@openrouter/agentclients.wrapGoogleADK(adkModule): wraps Google ADK modules, including runner, agent, and tool execution methods.
Model provider SDKs
wrapGoogleGenAI(): wraps the entire@google/genaipackage export (similar to the AI SDK wrapper). Tracesmodels.generateContent,models.generateContentStream, andmodels.embedContent.wrapOpenRouter(): pass an OpenRouter client. Traces chat, responses, embeddings, rerank, andcallModel()where available.wrapMistral(): pass a Mistral client. Traces chat, FIM (fill-in-the-middle), agents, and embeddings.wrapCohere(): pass a Cohere client. Traces chat, chat streaming, embed, and rerank.wrapGroq(): pass a Groq client. Traces chat completions and embeddings.wrapHuggingFace(): pass a Hugging Face Inference module or client. Traces chat completion, text generation, streaming variants, and feature extraction.
Testing
Run Braintrust evaluations inside your test runner, so eval cases live alongside your unit tests and your suite fails when scores regress.wrapVitest()
Wraps Vitest test APIs so tests can create Braintrust experiment rows with inputs, expected values, scorer results, and pass/fail status.
test, it, expect, describe, lifecycle hooks, and Braintrust helpers)
wrapVitest() returns wrapped test, it, expect, describe, lifecycle hooks, logOutputs, logFeedback, getCurrentSpan, and flushExperiment().
initNodeTestSuite()
Creates a Braintrust-backed suite for Node’s built-in node:test runner.
eval() and flush()
The suite exposes eval(config, fn) and flush(). If you pass Node’s after hook, flush() is registered automatically.
Configuration
Most configuration is passed per call, such asapiKey and projectName on initLogger(). You can also authenticate once with login(), or set credentials and deployment options through environment variables.
login()
Authenticates the SDK to Braintrust and stores the credentials in its global state, where every later SDK call reads them. Most SDK APIs log in automatically when they first need credentials.
Promise<BraintrustState>
Options (all optional):
apiKey(string): API key. Defaults toBRAINTRUST_API_KEY.appUrl(string): Braintrust app URL. Defaults tohttps://www.braintrust.dev.orgName(string): organization name, useful when credentials can access multiple orgs.fetch(typeof fetch): custom fetch implementation for SDK requests.forceLogin(boolean): log in again even if the SDK already has credentials.noExitFlush(boolean): disables the process exit handler that flushes pending writes.onFlushError((error) => void): handles errors from the background flusher.disableSpanCache(boolean): disables the local span cache used by eval scorers.debugLogLevel("error" | "warn" | "info" | "debug" | false): enables SDK troubleshooting output.
login() yourself. When BRAINTRUST_API_KEY is set, APIs like initLogger(), initExperiment(), initDataset(), loadPrompt(), and loadParameters() authenticate automatically using it. Call login() yourself when you want to authenticate with something other than that environment variable, such as a specific API key, organization, or custom fetch. Logging in stores those credentials in the SDK’s global state, so every later SDK call uses them, including APIs like getPromptVersions() that take no auth options of their own.
Environment variables
BRAINTRUST_API_KEY: API key used to authenticate. Same as theapiKeyoption.BRAINTRUST_API_URL: Braintrust API URL. Set this for self-hosted or data plane deployments.BRAINTRUST_APP_URL: Braintrust app URL, used for permalinks. Defaults tohttps://www.braintrust.dev. Same as theappUrloption.BRAINTRUST_ORG_NAME: organization name, useful when credentials can access multiple orgs. Same as theorgNameoption.BRAINTRUST_DISABLE_INSTRUMENTATION: comma-separated list of integrations to skip during auto-instrumentation, for exampleopenai,anthropic.BRAINTRUST_DEBUG_LOG_LEVEL: SDK troubleshooting output:error,warn,info, ordebug. Unset stays silent. Same as thedebugLogLeveloption.