> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Interpret evaluation results

> Diagnose where your AI system is underperforming and understand why. Drill into traces, score distributions, and individual test cases to find actionable improvement opportunities.

Each [offline evaluation](/evaluate/run-evaluations) creates an experiment, a permanent record of how the evaluated task performed on a dataset.

## View results

To view the results of an experiment, go to [**<Icon icon="beaker" /> Experiments**](https://www.braintrust.dev/app/~/experiments) in your project and select the experiment from the list.

* **Traces vs. spans** - By default, experiments display as a table of traces where each row represents a complete trace with its root span. To view the individual spans in traces instead, select <Icon icon="settings-2" /> **Display** > <Icon icon="rows-3" /> **Row type** > <Icon icon="diamond" /> **Spans**.

  View individual spans when you want to:

  * Analyze specific operations within traces
  * Find particular function calls or API requests
  * Examine timing and token usage for individual operations

  <Note>
    Spans view is optimized for analyzing individual operations. Experiment comparisons and diff mode are only available when viewing traces.
  </Note>

* **Metrics** - Along with the scores you track, Braintrust tracks a number of metrics about your LLM calls that help you assess and understand performance. For example, when you switch models, it's useful to look at duration, token metrics, and estimated cost together to understand the tradeoffs.

  To compute LLM metrics like token counts, make sure you [trace your LLM calls](/instrument/trace-llm-calls).

* **Experiment summary** - Select <Icon icon="arrow-right-to-line" /> **Details** to view:

  * Comparisons to other experiments
  * Scorers used in the evaluation
  * Datasets tested
  * Saved parameters linked to the evaluation
  * Metadata like model and parameters

  Copy the experiment ID from the bottom of the summary pane for referencing in code or sharing with teammates.

### Filter results

Each project provides default table views with common filters for experiments, including:

* **Default view**: Shows all traces in the experiment
* **Non-errors**: Shows only traces without errors
* **Errors**: Shows only traces with errors
* **Scorer errors**: Show only traces with scorer errors
* **Unreviewed**: Hides traces that have been human-reviewed
* **Assigned to me**: Shows only traces assigned to the current user for human review

Use the <Icon icon="layers-2" /> menu to switch the table view.

<Tip>
  Built-in views (such as "All experiments view") cannot be modified, but you can create [custom table views](#create-custom-table-views) based on custom filters and display settings.
</Tip>

You can also use the <Icon icon="list-filter" /> **Filter** menu to add custom filtering. Use the **Basic** tab for point-and-click filtering, or switch to **SQL** to write precise [SQL queries](/reference/sql). To filter experiments by metadata programmatically, use the `metadata` query parameter on `GET /v1/experiment`. See [Filter experiments by metadata](/api-reference#filter-experiments-by-metadata) for details.

### Group results

Select <Icon icon="settings-2" /> **Display** > <Icon icon="corner-down-right" /> **Group by** to group the table by metadata fields to see patterns.

By default, group rows show one experiment’s summary data. To view summary data for all experiments, select **Include comparisons in group**.

### Order by regressions

Score and metric columns show summary statistics in their headers. To order columns by regressions, select <Icon icon="settings-2" /> **Display** > <Icon icon="columns-2" /> **Columns** > **Order by regressions**.

Within grouped tables, this sorts rows by regressions of a specific score relative to a comparison experiment.

## Examine a trace

Select any row to open the trace view and see complete details:

* Input, output, and expected values
* Metadata and parameters
* All spans in the trace hierarchy
* Scores and their explanations
* Timing and token usage

Ask yourself: Do good scores correspond to good outputs? If not, update your scorers or test cases.

Use the <Icon icon="fullscreen" /> button to expand the trace to fullscreen or the <Icon icon="arrow-up-right" /> button to open it in a separate page. For details on trace views, layouts, and actions, see [Examine traces](/observe/examine-traces).

<Note>
  When [comparing experiments](/evaluate/compare-experiments) with diff mode enabled, only the default trace view is available. [Timeline](/observe/examine-traces#view-as-a-timeline), [Thread](/observe/examine-traces#view-as-a-conversation), and [custom views](/annotate/custom-views) are disabled during comparison.
</Note>

## Assign for review

You can assign experiment rows to team members for review, analysis, or follow-up action. Assignments are particularly useful for human review workflows, where you can assign specific rows that need human evaluation and distribute review work across multiple team members.

See [Assign rows for review](/annotate/human-review/manage-review-work#assign-rows-for-review) for details.

## Score retrospectively

Apply scorers and classifiers to existing experiments:

* **Multiple cases**: Select rows and use <Icon icon="percent" /> **Score** to apply chosen scorers and classifiers
* **Single case**: Open a trace and use <Icon icon="percent" /> **Score** in the trace view

Scores and classifications appear as additional spans within the trace.

## Analyze with Loop

Use [**<Icon icon="blend" /> Loop**](/loop) to analyze experiment results, identify patterns, and get improvement suggestions. Loop can help you understand why certain test cases succeeded or failed and generate actionable recommendations.

Select one or more experiments and open Loop to:

* **Summarize results**: Get high-level insights about experiment performance, score trends, and key differences between experiments.
* **Drill into specific rows**: Ask Loop to analyze test cases that performed poorly or identify patterns across failures.
* **Generate improvements**: Loop can suggest changes to prompts, scorers, or datasets based on experiment results.
* **Create datasets**: Extract problematic or interesting test cases into new datasets for targeted evaluation.
* **Generate code**: Get sample code for implementing improvements to test in your next experiment.

Example queries:

* "What improved from the last experiment?"
* "Categorize the errors in this experiment"
* "Pick the best scorers for this task"
* "Why did the factuality score drop?"
* "Create a dataset from the rows where the model failed"
* "What patterns do you see in the low-scoring cases?"

## Use aggregate scores

Aggregate scores are formulas that combine multiple scores into a single metric. They are useful when you track many scores but need a single metric to represent overall experiment quality.

See [Create aggregate scores](/admin/projects#create-aggregate-scores) for more details.

## Download results

To download an experiment's results, select <Icon icon="download" /> and then **Download as CSV** or **Download as JSON**.

## Customize the experiments table

### Adjust table layout

To switch between different layouts, select <Icon icon="settings-2" /> **Display** > <Icon icon="layout-grid" /> **Layout** and one of the following:

* **List**: Default table view
* **Grid**: Compare outputs side-by-side
* **Summary**: Large-type summary of scores and metrics across all experiments
* **Summary table**: Scores and metrics as rows with experiments as columns, with a PDF download option.

Layouts respect view filters and are automatically saved when you save a view.

### Show and hide columns

Select <Icon icon="settings-2" /> **Display** > <Icon icon="columns-2" /> **Columns** and then:

* Show or hide columns to focus on relevant data
* Reorder columns by dragging them
* Pin important columns to the left

All column settings are automatically saved when you save a view.

When [topics](/observe/topics) are enabled, facet outputs appear as columns in the experiments table, similar to scores. You can filter and sort by facet columns to analyze patterns in your evaluation results. This helps identify which types of inputs (e.g., specific user tasks or sentiment categories) perform better or worse in your experiments.

### Create custom columns

Extract specific values from traces using custom columns:

1. Select <Icon icon="settings-2" /> **Display** > <Icon icon="columns-2" /> **Columns** > **+ Add custom column**.
2. Name your column.
3. Choose from inferred fields or write a SQL expression.

Once created, filter and sort using your custom columns.

### Create custom table views

To create or update a custom table view:

1. Apply the filters and display settings you want.
2. Open the <Icon icon="layers-2" /> menu and select **Save view\...** or **Save view as...**.

<Note>
  Custom table views are visible to all project members. Creating or editing a table view requires the **Update** project permission.
</Note>

### Duplicate table views across projects

If you've built a useful custom table view in one project, you can duplicate it to another project via the API rather than recreating it from scratch. Experiments have two customizable views:

* Experiments list: The project's [**<Icon icon="beaker" /> Experiments**](https://www.braintrust.dev/app/~/experiments) tab, where each row is a experiment.
* Single experiment table: The rows of data inside one experiment.

The following steps work for either. Choose the corresponding `view_type` in the API call.

1. Use the [list views](/api-reference/views/list-views) API endpoint to fetch the experiment views in your source project. Pass the following query parameters:

   * `object_type=project`
   * `object_id=<source-project-id>`
   * `view_type=experiment` for a single experiment table view, or `view_type=experiments` for the experiments list

   <CodeGroup>
     ```bash Single experiment theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
     curl --request GET \
       --url 'https://api.braintrust.dev/v1/view?object_type=project&object_id=<source-project-id>&view_type=experiment' \
       --header 'Authorization: Bearer <your-api-key>'
     ```

     ```bash Experiments list theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
     curl --request GET \
       --url 'https://api.braintrust.dev/v1/view?object_type=project&object_id=<source-project-id>&view_type=experiments' \
       --header 'Authorization: Bearer <your-api-key>'
     ```
   </CodeGroup>
2. In the response, find the view you want to duplicate and copy its `view_data` and `options` payloads.
3. Use the [create view](/api-reference/views/create-view) API endpoint to create the view in the destination project. Set `object_id` to the destination project ID.

   <CodeGroup>
     ```bash Single experiment theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
     curl --request POST \
       --url https://api.braintrust.dev/v1/view \
       --header 'Authorization: Bearer <your-api-key>' \
       --header 'Content-Type: application/json' \
       --data '
       {
         "object_type": "project",
         "object_id": "<destination-project-id>",
         "view_type": "experiment",
         "name": "<new-view-name>",
         "view_data": <view-data-payload>,
         "options": <options-payload>
       }'
     ```

     ```bash Experiments list theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
     curl --request POST \
       --url https://api.braintrust.dev/v1/view \
       --header 'Authorization: Bearer <your-api-key>' \
       --header 'Content-Type: application/json' \
       --data '
       {
         "object_type": "project",
         "object_id": "<destination-project-id>",
         "view_type": "experiments",
         "name": "<new-view-name>",
         "view_data": <view-data-payload>,
         "options": <options-payload>
       }'
     ```
   </CodeGroup>

### Set default table views

You can set default views at three levels:

* **Organization default**: Visible to all members when they open the page. This applies per page. For example, you can set separate organization defaults for Logs, Experiments, and Review. To set an organization default, you need the **Manage settings** organization permission (included by default in the **Owner** role). See [Access control](/admin/access-control) for details.
* **Project default**: Overrides the organization default for everyone viewing this project. To set a project default, you need the project-level **Update** permission. Project admins can set project defaults even without organization-level permissions. See [Access control](/admin/access-control) for details.
* **Personal default**: Overrides the project and organization defaults for you only. Personal defaults are stored in your browser, so they do not carry over across devices or browsers.

To set a default view:

1. Switch to the view you want by selecting it from the <Icon icon="layers-2" /> menu.
2. Open the menu again and hover over the currently selected view to reveal its submenu.
3. Choose <Icon icon="flag-triangle-right" /> **Set as personal default view**, <Icon icon="bookmark" /> **Set as project default view**, or <Icon icon="pin" /> **Set as organization default view**.

To clear a default view:

1. Open the <Icon icon="layers-2" /> menu and hover over the currently selected view to reveal its submenu.
2. Choose <Icon icon="flag-triangle-right" /> **Clear personal default view**, <Icon icon="bookmark" /> **Clear project default view**, or <Icon icon="pin" /> **Clear organization default view**.

Default view settings are mutually exclusive on a given view. Setting one type of default on a view automatically clears any other default that was previously set on the same view.

When a user opens a page, Braintrust loads the first match in this order: personal default, project default, organization default, then the standard "All ..." view (for example, "All logs view").

### Change the table density

To change the table density to see more or less detail per row, select <Icon icon="settings-2" /> **Display** > <Icon icon="list-chevrons-up-down" /> **Row height** > **Compact** or **Tall**.

## Export experiments

<Tabs>
  <Tab title="UI" icon="mouse-pointer-2">
    To export an experiment's results, open the menu next to the experiment name. You can export as CSV or JSON, and choose whether to download all fields.

    <img src="https://mintcdn.com/braintrust/286-LRz_qGMfyggP/images/core/experiments/exporting-experiments.png?fit=max&auto=format&n=286-LRz_qGMfyggP&q=85&s=c9d392fe5158555fa7547107c478ee4e" alt="Export experiments" width="2198" height="1496" data-path="images/core/experiments/exporting-experiments.png" />
  </Tab>

  <Tab title="SDK" icon="code">
    Access data from previous experiments by passing the `open` flag to `init()`:

    <CodeGroup dropdown>
      ```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      import { init } from "braintrust";

      async function openExperiment() {
        const experiment = init("My Project", {
          experiment: "my-experiment",
          open: true,
        });

        for await (const testCase of experiment) {
          console.log(testCase);
        }
      }
      ```

      ```python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      import braintrust

      def open_experiment():
          experiment = braintrust.init(
              project="My Project",
              experiment="my-experiment",
              open=True,
          )
          for test_case in experiment:
              print(test_case)
      ```
    </CodeGroup>

    Convert experiments to dataset format using `asDataset()`/`as_dataset()`:

    <CodeGroup dropdown>
      ```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      import { init } from "braintrust";

      async function openExperiment() {
        const experiment = init("My Project", {
          experiment: "my-experiment",
          open: true,
        });

        for await (const testCase of experiment.asDataset()) {
          console.log(testCase);
        }
      }
      ```

      ```python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      import braintrust

      def open_experiment():
          experiment = braintrust.init(
              project="My Project",
              experiment="my-experiment",
              open=True,
          )
          for test_case in experiment.as_dataset():
              print(test_case)
      ```
    </CodeGroup>
  </Tab>

  <Tab title="API" icon="code">
    Fetch experiment events via the API using [Fetch experiment (POST form)](https://www.braintrust.dev/docs/api-reference#fetch-experiment-post-form) or [Fetch experiment (GET form)](https://www.braintrust.dev/docs/api-reference#fetch-experiment-get-form).

    You can also query experiments with SQL for custom analysis. For example, to check review status:

    <CodeGroup dropdown>
      ```python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      import os
      import requests

      API_URL = "https://api.braintrust.dev/"
      headers = {"Authorization": "Bearer " + os.environ["BRAINTRUST_API_KEY"]}

      def fetch_experiment_review_status(experiment_id: str) -> dict:
          # Replace "response quality" with your review score column name
          query = f"""
          SELECT
            sum(CASE WHEN scores."response quality" IS NOT NULL THEN 1 ELSE 0 END) AS reviewed,
            sum(CASE WHEN is_root THEN 1 ELSE 0 END) AS total
          FROM experiment('{experiment_id}')
          """

          return requests.post(
              f"{API_URL}/btql",
              headers=headers,
              json={"query": query, "fmt": "json"},
          ).json()

      # Usage
      result = fetch_experiment_review_status("your-experiment-id")
      print(f"Reviewed: {result['data'][0]['reviewed']}/{result['data'][0]['total']}")
      ```

      ```typescript theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
      const API_URL = "https://api.braintrust.dev/";
      const headers = {
        Authorization: `Bearer ${process.env.BRAINTRUST_API_KEY}`,
      };

      async function fetchExperimentReviewStatus(experimentId: string) {
        // Replace "response quality" with your review score column name
        const query = `
          SELECT
            sum(CASE WHEN scores."response quality" IS NOT NULL THEN 1 ELSE 0 END) AS reviewed,
            sum(CASE WHEN is_root THEN 1 ELSE 0 END) AS total
          FROM experiment('${experimentId}')
        `;

        const response = await fetch(`${API_URL}/btql`, {
          method: "POST",
          headers,
          body: JSON.stringify({ query, fmt: "json" }),
        });

        return await response.json();
      }

      // Usage
      const result = await fetchExperimentReviewStatus("your-experiment-id");
      console.log(`Reviewed: ${result.data[0].reviewed}/${result.data[0].total}`);
      ```
    </CodeGroup>
  </Tab>

  <Tab title="CLI" icon="terminal">
    Download experiment data to a local NDJSON file with [`bt sync pull`](/reference/cli/sync):

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
    bt sync pull experiment:my-experiment
    ```

    Query experiment data with SQL using [`bt sql`](/reference/cli/sql):

    ```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
    bt sql "SELECT id, input, output, scores FROM experiment('my-experiment')"
    ```
  </Tab>
</Tabs>

## Share an experiment

Experiment URLs are name-based, so a shared link breaks when the experiment is renamed. A permalink uses the experiment's object ID instead, so it stays valid permanently. Use permalinks to share results, bookmark experiments, or include stable links in reports.

To copy a permalink, use the permalink button in the experiment view. You can also construct one by hand from the experiment's ID:

```
https://www.braintrust.dev/app/object?object_type=experiment&object_id=<experiment_id>
```

Visiting this URL redirects to the experiment's canonical page, regardless of organization or project.

## Next steps

* [Compare experiments](/evaluate/compare-experiments) systematically
* [Write scorers](/evaluate/write-scorers) to measure what matters
* [Use playgrounds](/evaluate/playgrounds) for rapid iteration
* [Run evaluations](/evaluate/run-evaluations) in CI/CD
