> ## Documentation Index
> Fetch the complete documentation index at: https://braintrust.dev/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# LiteLLM

> Trace LiteLLM calls in Braintrust to debug routing and outputs across 100+ model providers

If you are a coding agent, prefer the Braintrust [`bt` CLI](/reference/cli/quickstart) for repeatable, scriptable work: running evals, instrumenting code, querying logs, syncing data, managing functions, and configuring coding agents. Use the MCP server for reasoning over Braintrust data in conversation, such as ad-hoc lookups and exploration from your IDE.

[LiteLLM](https://www.litellm.ai/) is a unified interface for calling 100+ LLM APIs using the OpenAI format. Braintrust traces LiteLLM calls across any provider it supports.

<View title="Python" icon="https://img.logo.dev/python.org?token=pk_BdcHD9e5SCW3j1rnJkNyMQ">
  <h2 id="tracing-python">
    Tracing
  </h2>

  To trace LiteLLM with Braintrust's Python SDK, use auto-instrumentation or manual instrumentation. Auto-instrumentation is the recommended path for most users.

  <Tabs>
    <Tab title="Auto-instrumentation">
      Auto-instrumentation patches LiteLLM at startup, so calls are traced without per-call wiring in your code.

      <h3 id="setup-python-auto">
        Setup
      </h3>

      Install the Braintrust SDK and LiteLLM, then set your API keys for the providers you use. The examples below use OpenAI.

      <Steps>
        <Step title="Install packages">
          <CodeGroup>
            ```bash uv theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
            uv add braintrust litellm
            ```

            ```bash pip theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
            pip install braintrust litellm
            ```
          </CodeGroup>
        </Step>

        <Step title="Set environment variables">
          ```bash title=".env" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
          BRAINTRUST_API_KEY=<your-braintrust-api-key>
          OPENAI_API_KEY=<your-openai-api-key>
          ```
        </Step>
      </Steps>

      <h3 id="trace-python-auto">
        Trace your application
      </h3>

      To trace LiteLLM without modifying your application code, call `braintrust.auto_instrument()` before importing LiteLLM.

      <CodeGroup>
        ```python Python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
        import os

        import braintrust

        braintrust.auto_instrument()
        braintrust.init_logger(
            api_key=os.environ["BRAINTRUST_API_KEY"],
            project="litellm-example",  # Replace with your project name
        )

        import litellm

        response = litellm.completion(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "What is the capital of France?"}],
        )
        print(response.choices[0].message.content)
        ```
      </CodeGroup>

      <Accordion title="Trace only LiteLLM">
        To trace LiteLLM without auto-instrumenting other libraries, use `patch_litellm()` instead of `braintrust.auto_instrument()`.

        <CodeGroup>
          ```python Python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
          from braintrust.integrations.litellm import patch_litellm

          patch_litellm()

          import litellm
          from braintrust import init_logger

          # Initialize Braintrust
          logger = init_logger(project="litellm-example")

          # Use LiteLLM as normal - all calls are automatically traced
          response = litellm.completion(
              model="gpt-4o-mini",
              messages=[{"role": "user", "content": "What is the capital of France?"}]
          )
          ```
        </CodeGroup>
      </Accordion>

      <h3 id="what-traced-python-auto">
        What Braintrust traces
      </h3>

      Braintrust patches LiteLLM's top-level call entry points and creates an LLM-typed span per call:

      * Completion spans (`Completion`) for `litellm.completion`, `litellm.acompletion`, `litellm.text_completion`, and `litellm.atext_completion`, with messages or prompt, model, and request parameters; response choices, token usage, and time-to-first-token for streaming.
      * Responses API spans (`Response`) for `litellm.responses` / `litellm.aresponses`, with input and request parameters; response output, token usage, and time-to-first-token for streaming.
      * Image generation spans (`Image Generation`) for `litellm.image_generation` / `litellm.aimage_generation`, with prompt and request parameters; output capturing per-image data (attachment for base64 responses or URL reference for URL responses) plus metadata like output format, size, quality, and image count, plus timing and token usage when reported.
      * Embedding spans (`Embedding`) for `litellm.embedding` / `litellm.aembedding`, with input text and request parameters; output summarized as the embedding vector dimension (length of the first embedding), plus token usage.
      * Moderation spans (`Moderation`) for `litellm.moderation` / `litellm.amoderation`, with input and request parameters; classification results and token usage when reported.
      * Speech spans (`Speech`) for `litellm.speech` / `litellm.aspeech`, with text input and request parameters; generated audio captured as an attachment, plus timing.
      * Transcription spans (`Transcription`) for `litellm.transcription` / `litellm.atranscription`, with the input audio captured as an attachment plus model and request parameters; transcribed text and token usage.
      * Rerank spans (`Rerank`) for `litellm.rerank` / `litellm.arerank`, with query, documents, and request parameters (plus auto-derived `document_count`); results as a list of `{index, relevance_score}` items (capped at 100, with documents intentionally dropped); token metrics (prompt, completion, total), plus Cohere-style billed-unit metrics (search units, classifications) when the response includes them.
      * Token usage metrics (prompt, completion, total, plus cached and reasoning tokens when the provider reports them).
      * Errors captured on every call.

      <h3 id="tracing-resources-python-auto">
        Tracing resources
      </h3>

      * [LiteLLM documentation](https://docs.litellm.ai/)
      * [LiteLLM supported providers](https://docs.litellm.ai/docs/providers)
      * [DSPy integration](/integrations/sdk-integrations/dspy), which combines LiteLLM tracing with DSPy-specific callbacks
    </Tab>

    <Tab title="Manual instrumentation">
      Manual instrumentation wraps a LiteLLM module reference yourself, so you control which instance is traced.

      <h3 id="setup-python-manual">
        Setup
      </h3>

      Install the Braintrust SDK and LiteLLM, then set your API keys for the providers you use. The examples below use OpenAI.

      <Steps>
        <Step title="Install packages">
          <CodeGroup>
            ```bash uv theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
            uv add braintrust litellm
            ```

            ```bash pip theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
            pip install braintrust litellm
            ```
          </CodeGroup>
        </Step>

        <Step title="Set environment variables">
          ```bash title=".env" theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
          BRAINTRUST_API_KEY=<your-braintrust-api-key>
          OPENAI_API_KEY=<your-openai-api-key>
          ```
        </Step>
      </Steps>

      <h3 id="trace-python-manual">
        Trace your application
      </h3>

      To trace a specific LiteLLM module instance manually, wrap it yourself with `wrap_litellm()`. Use this when you want to instrument a particular module reference rather than patching the globally-imported `litellm`.

      <CodeGroup>
        ```python Python theme={"theme":{"light":"github-light","dark":"github-dark-dimmed"}}
        import os

        import litellm
        from braintrust import init_logger
        from braintrust.integrations.litellm import wrap_litellm

        init_logger(
            api_key=os.environ["BRAINTRUST_API_KEY"],
            project="litellm-example",  # Replace with your project name
        )

        wrap_litellm(litellm)

        response = litellm.completion(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "What is the capital of France?"}],
        )
        print(response.choices[0].message.content)
        ```
      </CodeGroup>

      <h3 id="what-traced-python-manual">
        What Braintrust traces
      </h3>

      Braintrust patches LiteLLM's top-level call entry points and creates an LLM-typed span per call:

      * Completion spans (`Completion`) for `litellm.completion`, `litellm.acompletion`, `litellm.text_completion`, and `litellm.atext_completion`, with messages or prompt, model, and request parameters; response choices, token usage, and time-to-first-token for streaming.
      * Responses API spans (`Response`) for `litellm.responses` / `litellm.aresponses`, with input and request parameters; response output, token usage, and time-to-first-token for streaming.
      * Image generation spans (`Image Generation`) for `litellm.image_generation` / `litellm.aimage_generation`, with prompt and request parameters; output capturing per-image data (attachment for base64 responses or URL reference for URL responses) plus metadata like output format, size, quality, and image count, plus timing and token usage when reported.
      * Embedding spans (`Embedding`) for `litellm.embedding` / `litellm.aembedding`, with input text and request parameters; output summarized as the embedding vector dimension (length of the first embedding), plus token usage.
      * Moderation spans (`Moderation`) for `litellm.moderation` / `litellm.amoderation`, with input and request parameters; classification results and token usage when reported.
      * Speech spans (`Speech`) for `litellm.speech` / `litellm.aspeech`, with text input and request parameters; generated audio captured as an attachment, plus timing.
      * Transcription spans (`Transcription`) for `litellm.transcription` / `litellm.atranscription`, with the input audio captured as an attachment plus model and request parameters; transcribed text and token usage.
      * Rerank spans (`Rerank`) for `litellm.rerank` / `litellm.arerank`, with query, documents, and request parameters (plus auto-derived `document_count`); results as a list of `{index, relevance_score}` items (capped at 100, with documents intentionally dropped); token metrics (prompt, completion, total), plus Cohere-style billed-unit metrics (search units, classifications) when the response includes them.
      * Token usage metrics (prompt, completion, total, plus cached and reasoning tokens when the provider reports them).
      * Errors captured on every call.

      <h3 id="tracing-resources-python-manual">
        Tracing resources
      </h3>

      * [LiteLLM documentation](https://docs.litellm.ai/)
      * [LiteLLM supported providers](https://docs.litellm.ai/docs/providers)
      * [DSPy integration](/integrations/sdk-integrations/dspy), which combines LiteLLM tracing with DSPy-specific callbacks
    </Tab>
  </Tabs>
</View>