> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# LangSmith

> Integrate LangSmith tracing and evaluations with Braintrust

If you are a coding agent, prefer the Braintrust [`bt` CLI](/reference/cli/quickstart) for repeatable, scriptable work: running evals, instrumenting code, querying logs, syncing data, managing functions, and configuring coding agents. Use the MCP server for reasoning over Braintrust data in conversation, such as ad-hoc lookups and exploration from your IDE.

<Warning>
  This is an experimental feature. The API may change based on user feedback.
</Warning>

[LangSmith](https://smith.langchain.com/) is LangChain's platform for tracing, evaluation, and monitoring of LLM applications. Braintrust provides an experimental wrapper to integrate LangSmith with Braintrust. The wrapper can either send tracing and evaluation calls to both LangSmith and Braintrust in parallel, or route them solely to Braintrust, with minimal code changes.

The wrapper supports two modes:

* **Parallel** (default): Send traces and evaluations to both LangSmith and Braintrust simultaneously. Use this to compare services, maintain existing workflows, or run both long-term.
* **Standalone**: Send traces and evaluations only to Braintrust. Use this when you want to use Braintrust exclusively.

## Setup

Install LangSmith alongside the Braintrust SDK (requires Braintrust Python SDK v0.4.3 or later):

<CodeGroup>
  ```bash Python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
  # uv
  uv add braintrust langsmith
  # pip
  pip install braintrust langsmith
  ```
</CodeGroup>

Set your Braintrust API key as an environment variable:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
export BRAINTRUST_API_KEY=your-braintrust-api-key
```

Make sure you have LangSmith environment variables set as well:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_PROJECT=your-project-name
export LANGSMITH_API_KEY=your-langsmith-api-key
```

## Tracing

The wrapper automatically redirects:

* Functions decorated with LangSmith's `@traceable` to Braintrust's [`@traced`](/instrument/trace-application-logic#trace-function-calls)
* Nested span hierarchies with inputs and outputs
* Complete execution traces with metadata

### Parallel tracing

By default, traces are sent to both LangSmith and Braintrust simultaneously.

To use the wrapper, call `setup_langsmith()` **before** importing from LangSmith modules:

```python trace_parallel.py {3-9} theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import os

# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith

setup_langsmith(
    project_name="langsmith-integration",
    api_key=os.environ.get("BRAINTRUST_API_KEY"),
)

# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import traceable
from openai import OpenAI

client = OpenAI()

@traceable(name="chat_completion")
def chat_completion() -> str:
    """Single traced call."""
    result = client.responses.create(
        model="gpt-5-mini",
        input=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is machine learning?"},
        ],
    )
    return result.output_text


if __name__ == "__main__":
    print(chat_completion())
```

<Note>
  The wrapper automatically reads the project name from the `LANGCHAIN_PROJECT` environment variable. You can override this by passing `project_name` to `setup_langsmith()`.
</Note>

### Standalone tracing

With standalone mode, traces are sent only to Braintrust.

To enable standalone tracing, set `standalone=True`:

```python trace_standalone.py {9} theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import os

# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith

setup_langsmith(
    project_name="langsmith-integration",
    api_key=os.environ.get("BRAINTRUST_API_KEY"),
    standalone=True,  # Only Braintrust will receive traces
)

# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import traceable
from openai import OpenAI

client = OpenAI()

@traceable(name="chat_completion")
def chat_completion() -> str:
    """Single traced call."""
    result = client.responses.create(
        model="gpt-5-mini",
        input=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "What is machine learning?"},
        ],
    )
    return result.output_text


if __name__ == "__main__":
    print(chat_completion())
```

You can also enable standalone mode via environment variable:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
export BRAINTRUST_STANDALONE=1
```

## Evaluations

The wrapper automatically redirects:

* `evaluate()` calls to Braintrust's [`Eval()`](/evaluate/run-evaluations) function
* `aevaluate()` calls to Braintrust's `EvalAsync()` function

Just like for tracing, you can send evaluation calls to both LangSmith and Braintrust in parallel, or only to Braintrust.

### Parallel evals

Evaluators follow the LangSmith signature: `(inputs, outputs, reference_outputs) -> bool | dict`. The wrapper automatically converts these to Braintrust scorers.

<Note>
  When LangSmith evals are instrumented with `@traceable`, scores show up both in experiments and logging.
</Note>

```python eval_langsmith_parallel.py {3-9} theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import os

# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith

setup_langsmith(
    project_name="langsmith-integration",
    api_key=os.environ.get("BRAINTRUST_API_KEY"),
)

# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import Client, traceable

# Define a target function (the function being evaluated)
# LangSmith requires the parameter to be named 'inputs' (or 'attachments'/'metadata')
def multiply(inputs: dict, **kwargs) -> int:
    """Multiply two numbers.

    Args:
        inputs: Dictionary with 'x' and 'y' keys
        **kwargs: Additional arguments (e.g., langsmith_extra from LangSmith)
    """
    return inputs["x"] * inputs["y"]

# Define LangSmith-style evaluators
# LangSmith evaluators use signature: (inputs, outputs, reference_outputs) -> bool | dict
# When target returns a plain value, LangSmith wraps it as {"output": value}
def exact_match_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    """
    LangSmith-style evaluator that checks for exact match.
    """
    expected = reference_outputs["output"]
    actual = outputs["output"]
    return {
        "key": "exact_match",
        "score": 1.0 if actual == expected else 0.0,
    }

def range_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    """
    LangSmith-style evaluator that checks if result is in expected range.
    """
    actual = outputs["output"]
    expected = reference_outputs["output"]
    # Check if within 10% of expected
    if expected == 0:
        score = 1.0 if actual == 0 else 0.0
    else:
        diff = abs(actual - expected) / abs(expected)
        score = 1.0 if diff <= 0.1 else 0.0
    return {
        "key": "within_range",
        "score": score,
        "metadata": {"actual": actual, "expected": expected},
    }

def main():
    print("LangSmith to Braintrust Evaluation Example")
    print("=" * 50)
    print()

    # Create a LangSmith client (patched to use Braintrust)
    client = Client()

    # Create a dataset in LangSmith (proper LangSmith API usage)
    dataset_name = "multiply-dataset-example"

    # Try to get or create the dataset
    try:
        dataset = client.read_dataset(dataset_name=dataset_name)
        print(f"Using existing dataset: {dataset_name}")
    except Exception:
        # Create new dataset if it doesn't exist
        dataset = client.create_dataset(dataset_name=dataset_name, description="Multiplication test dataset")
        print(f"Created new dataset: {dataset_name}")

        # Create examples in the dataset (proper LangSmith API)
        client.create_examples(
            dataset_id=dataset.id,
            examples=[
                {"inputs": {"x": 2, "y": 3}, "outputs": {"output": 6}},
                {"inputs": {"x": 5, "y": 5}, "outputs": {"output": 25}},
                {"inputs": {"x": 10, "y": 0}, "outputs": {"output": 0}},
                {"inputs": {"x": 7, "y": 8}, "outputs": {"output": 56}},
            ],
        )
        print(f"Created {4} examples in dataset")

    print()
    print("Running evaluation...")
    print()

    # Run evaluation using LangSmith's API (redirects to Braintrust)
    # Pass the dataset name - this is valid LangSmith API usage
    client.evaluate(
        multiply,  # Target function
        data=dataset_name,  # Dataset name (valid LangSmith API)
        evaluators=[exact_match_evaluator, range_evaluator],
        experiment_prefix="multiply-test",
        description="Testing multiplication function",
        metadata={"version": "1.0", "migrated_from": "langsmith"},
    )
    print()
    print("=" * 50)
    print("✓ Evaluation completed!")
    print("Check Braintrust to see the experiment results.")


if __name__ == "__main__":
    main()
```

### Standalone evals

```python eval_langsmith_standalone.py {9} theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
import os

# Setup Braintrust wrapper BEFORE importing from langsmith
from braintrust.wrappers.langsmith_wrapper import setup_langsmith

setup_langsmith(
    project_name="langsmith-integration",
    api_key=os.environ.get("BRAINTRUST_API_KEY"),
    standalone=True,  # Only Braintrust will receive evals
)

# Now import and use LangSmith as usual - these are patched to use Braintrust
from langsmith import Client, traceable

# Define a target function (the function being evaluated)
# LangSmith requires the parameter to be named 'inputs' (or 'attachments'/'metadata')
def multiply(inputs: dict, **kwargs) -> int:
    """Multiply two numbers.

    Args:
        inputs: Dictionary with 'x' and 'y' keys
        **kwargs: Additional arguments (e.g., langsmith_extra from LangSmith)
    """
    return inputs["x"] * inputs["y"]

# Define LangSmith-style evaluators
# LangSmith evaluators use signature: (inputs, outputs, reference_outputs) -> bool | dict
# When target returns a plain value, LangSmith wraps it as {"output": value}
def exact_match_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    """
    LangSmith-style evaluator that checks for exact match.
    """
    expected = reference_outputs["output"]
    actual = outputs["output"]
    return {
        "key": "exact_match",
        "score": 1.0 if actual == expected else 0.0,
    }

def range_evaluator(inputs: dict, outputs: dict, reference_outputs: dict) -> dict:
    """
    LangSmith-style evaluator that checks if result is in expected range.
    """
    actual = outputs["output"]
    expected = reference_outputs["output"]
    # Check if within 10% of expected
    if expected == 0:
        score = 1.0 if actual == 0 else 0.0
    else:
        diff = abs(actual - expected) / abs(expected)
        score = 1.0 if diff <= 0.1 else 0.0
    return {
        "key": "within_range",
        "score": score,
        "metadata": {"actual": actual, "expected": expected},
    }

def main():
    print("LangSmith to Braintrust Evaluation Example")
    print("=" * 50)
    print()

    # Create a LangSmith client (patched to use Braintrust)
    client = Client()

    # Create a dataset in LangSmith (proper LangSmith API usage)
    dataset_name = "multiply-dataset-example"

    # Try to get or create the dataset
    try:
        dataset = client.read_dataset(dataset_name=dataset_name)
        print(f"Using existing dataset: {dataset_name}")
    except Exception:
        # Create new dataset if it doesn't exist
        dataset = client.create_dataset(dataset_name=dataset_name, description="Multiplication test dataset")
        print(f"Created new dataset: {dataset_name}")

        # Create examples in the dataset (proper LangSmith API)
        client.create_examples(
            dataset_id=dataset.id,
            examples=[
                {"inputs": {"x": 2, "y": 3}, "outputs": {"output": 6}},
                {"inputs": {"x": 5, "y": 5}, "outputs": {"output": 25}},
                {"inputs": {"x": 10, "y": 0}, "outputs": {"output": 0}},
                {"inputs": {"x": 7, "y": 8}, "outputs": {"output": 56}},
            ],
        )
        print(f"Created {4} examples in dataset")

    print()
    print("Running evaluation...")
    print()

    # Run evaluation using LangSmith's API (redirects to Braintrust)
    # Pass the dataset name - this is valid LangSmith API usage
    client.evaluate(
        multiply,  # Target function
        data=dataset_name,  # Dataset name (valid LangSmith API)
        evaluators=[exact_match_evaluator, range_evaluator],
        experiment_prefix="multiply-test",
        description="Testing multiplication function",
        metadata={"version": "1.0", "migrated_from": "langsmith"},
    )
    print()
    print("=" * 50)
    print("✓ Evaluation completed!")
    print("Check Braintrust to see the experiment results.")


if __name__ == "__main__":
    main()
```

## Resources

* [LangSmith documentation](https://docs.smith.langchain.com/)
* [Braintrust evaluation guide](/evaluate)
