> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# bt datasets

> Create, list, view, update, and delete remote Braintrust datasets from the CLI

export const feature_0 = "Snapshots"

export const verb_0 = "are"

`bt datasets` manages remote [Braintrust datasets](/annotate/datasets) directly from the CLI.

## Subcommands

| Subcommand                           | Description                                                                             |
| ------------------------------------ | --------------------------------------------------------------------------------------- |
| `bt datasets list`                   | List all datasets in the current project                                                |
| `bt datasets create <name>`          | Create a dataset, optionally seeding it with records                                    |
| `bt datasets update <name>`          | Upsert records into a dataset (aliases: `add`, `refresh`)                               |
| `bt datasets view <name>`            | Display dataset metadata and preview records                                            |
| `bt datasets delete <name>`          | Delete a dataset and all its records                                                    |
| `bt datasets pipeline <stage>`       | Transform project logs into dataset rows (`run`, or staged `pull`, `transform`, `push`) |
| `bt datasets snapshots <subcommand>` | Create, list, restore, and delete dataset snapshots (aliases: `versions`, `version`)    |

## bt datasets list

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
bt datasets list
```

Lists all datasets in the current project.

## bt datasets create

Create a dataset, optionally seeding it with records from a file, stdin, or inline JSON.

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
# Create an empty dataset
bt datasets create my-dataset

# Create with a description
bt datasets create my-dataset --description "Dataset for smoke tests"

# Seed from a JSONL file
bt datasets create my-dataset --file records.jsonl

# Seed from stdin
cat records.jsonl | bt datasets create my-dataset

# Seed with inline JSON rows
bt datasets create my-dataset --rows '[{"id":"case-1","input":{"text":"hi"},"expected":"hello"}]'

# Rows without id fields get auto-generated stable IDs
bt datasets create my-dataset --rows '[{"input":{"text":"hi"},"expected":"hello"}]'
```

### Flags

| Flag                   | Description                            |
| ---------------------- | -------------------------------------- |
| `--description <TEXT>` | Dataset description                    |
| `--file <PATH>`        | Seed from a JSONL file                 |
| `--rows <JSON>`        | Seed with an inline JSON array of rows |

<Note>
  When rows omit an `id` field, `bt datasets` auto-generates stable record IDs.
</Note>

## bt datasets update

Upsert records into an existing dataset. Also available as `bt datasets add` and `bt datasets refresh`.

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
bt datasets update my-dataset --file records.jsonl
bt datasets add my-dataset --rows '[{"id":"case-2","input":{"text":"bye"},"expected":"goodbye"}]'
bt datasets refresh my-dataset --file records.jsonl --id-field metadata.case_id
```

### Flags

| Flag                | Description                                                          |
| ------------------- | -------------------------------------------------------------------- |
| `--file <PATH>`     | Input JSONL file                                                     |
| `--rows <JSON>`     | Inline JSON array of rows                                            |
| `--id-field <PATH>` | Dot-separated path to use as the record ID instead of the `id` field |

<Note>
  Each row must have a stable ID via the `id` field or `--id-field`. Rows without IDs are rejected.

  `update`, `add`, and `refresh` upsert rows directly — rows not in the input are not deleted. `refresh` fails if the dataset does not exist.

  `--id-field` uses dot-separated paths (e.g., `metadata.case_id`). Escape literal dots as `\.` and literal backslashes as `\\`.

  Input may also be a JSON object with a top-level `rows` array (matching `bt datasets view --json` output). Each row in `rows` is validated against the accepted fields: `id`, `input`, `expected`, `metadata`, `tags`, and `origin`.
</Note>

## bt datasets view

Display dataset metadata and preview records in the terminal.

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
bt datasets view my-dataset            # up to 200 rows by default
bt datasets view my-dataset --limit 50
bt datasets view my-dataset --all-rows
bt datasets view my-dataset --full     # exact values, no truncation
bt datasets view my-dataset --json     # machine-readable JSON output
```

### Flags

| Flag          | Description                                                  |
| ------------- | ------------------------------------------------------------ |
| `--limit <N>` | Maximum rows to show (default: 200)                          |
| `--all-rows`  | Show all rows regardless of dataset size                     |
| `--full`      | Show exact values without truncation                         |
| `--json`      | Output as JSON (compatible as input to `bt datasets update`) |

## bt datasets delete

Permanently delete a dataset and all its records.

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
bt datasets delete my-dataset
```

<Warning>
  This operation is irreversible. All records in the dataset are permanently deleted.
</Warning>

## bt datasets pipeline

Transform project logs into dataset rows using a pipeline declared with `DatasetPipeline(...)` in a TypeScript or Python file. See [Dataset pipelines](/annotate/datasets/pipelines) for how to write a pipeline definition.

<Warning>
  **Beta** — This feature is subject to change.
</Warning>

<Note>
  Dataset pipelines require `bt` CLI v0.10.0 or later, plus the `braintrust` SDK for the language you write the pipeline in: TypeScript SDK v3.16.0 or later, or Python SDK v0.23.0 or later.
</Note>

Run the full pipeline in one shot, or split it into staged `pull`, `transform`, and `push` steps:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
# One-shot: discover source refs, transform, and insert new rows
bt datasets pipeline run ./pipeline.ts --limit 100

# Staged: inspect or edit artifacts between steps
bt datasets pipeline pull ./pipeline.ts --limit 500
bt datasets pipeline transform ./pipeline.ts
bt datasets pipeline push ./pipeline.ts

# Python pipelines are supported too
bt datasets pipeline run ./pipeline.py --project "My Project" --limit 100
```

Staged runs write `pulled.jsonl` and `transformed.jsonl` to the `bt-sync/` directory by default. Inspect or edit `transformed.jsonl` before running `push`.

### Flags

| Flag                       | Description                                                   |
| -------------------------- | ------------------------------------------------------------- |
| `--limit <N>`              | Number of source spans or traces to discover                  |
| `--window <DURATION>`      | Constrain source discovery by `created` time (default: `1d`)  |
| `--root-span-id <ID>`      | Restrict pulling to one or more specific root spans           |
| `--root <PATH>`            | Directory for staged artifacts (default: `bt-sync`)           |
| `--out <PATH>`             | Override the managed output path for `pull` and `transform`   |
| `--in <PATH>`              | Override the input artifact for `transform` or `push`         |
| `--fresh`                  | Restart an already completed push spec (`push`)               |
| `--project <NAME>`         | Source project when the pipeline source omits a project       |
| `--source-project <NAME>`  | Override the source project (`pull`, `transform`, `run`)      |
| `--source-project-id <ID>` | Override the source project ID (`pull`, `transform`, `run`)   |
| `--source-org <NAME>`      | Override the source organization (`pull`, `transform`, `run`) |
| `--source-filter <FILTER>` | Override the source SQL filter (`pull`, `transform`, `run`)   |
| `--target-project <NAME>`  | Override the target project (`run`, `push`)                   |
| `--target-project-id <ID>` | Override the target project ID (`run`, `push`)                |
| `--target-org <NAME>`      | Override the target organization (`run`, `push`)              |
| `--target-dataset <NAME>`  | Override the target dataset (`run`, `push`)                   |
| `--max-concurrency <N>`    | Maximum concurrent transforms                                 |
| `--name <NAME>`            | Select a pipeline when the file defines more than one         |

## bt datasets snapshots

Manage [snapshots](/annotate/datasets/manage#save-snapshots) of a dataset directly from the CLI. Snapshots are named checkpoints that pin a dataset's state at a specific transaction, so you can preserve a version before making changes or roll back later. Also available as `bt datasets versions` and `bt datasets version`.

<Note>
  {feature_0} {verb_0} only available on [Pro and Enterprise plans](/plans-and-limits#plans).
</Note>

Use the `create`, `list`, `restore`, and `delete` subcommands to manage a dataset's snapshots:

```bash theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
# Create a snapshot of the dataset's current head transaction
bt datasets snapshots create my-dataset baseline

# Snapshot a specific transaction, with an auto-generated name when omitted
bt datasets snapshots create my-dataset --xact-id 1000192656880881099

# List saved snapshots (name, description, transaction ID, and creation time)
bt datasets snapshots list my-dataset

# Restore the dataset to a snapshot (previews affected rows, then prompts)
bt datasets snapshots restore my-dataset --name baseline

# Restore by transaction ID without prompting
bt datasets snapshots restore my-dataset --snapshot 1000192656880881099 --force

# Delete a snapshot by name or transaction ID
bt datasets snapshots delete my-dataset baseline
bt datasets snapshots delete my-dataset --snapshot 1000192656880881099 --force
```

`create` captures the dataset's current head transaction by default. `restore` previews how many rows will be restored and deleted, then prompts for confirmation before applying the change. `restore` and `delete` are irreversible, so pass `--force` (`-f`) only when you want to skip the confirmation prompt.

### Flags

| Flag                   | Description                                                                                                                                               |
| ---------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--name <NAME>`, `-n`  | Snapshot name (`create`, `restore`, `delete`). For `create` and `delete`, also accepted as a positional argument; auto-generated on `create` when omitted |
| `--xact-id <XACT_ID>`  | Transaction ID to snapshot (`create`, default: the dataset's current head transaction)                                                                    |
| `--description <TEXT>` | Optional snapshot description (`create`)                                                                                                                  |
| `--snapshot <XACT_ID>` | Transaction ID to restore or delete (`restore`, `delete`; alias: `--version`)                                                                             |
| `--force`, `-f`        | Skip the confirmation prompt (`restore`, `delete`)                                                                                                        |

<Note>
  For `restore` and `delete`, the snapshot name and `--snapshot` transaction ID are mutually exclusive.
</Note>
