litellm

$npx mdskill add TerminalSkills/skills/litellm

Switch between 100+ LLM providers using a unified API interface

  • Solve the problem of managing multiple LLM providers with a single interface
  • Supports OpenAI, Anthropic, Google, Mistral, Cohere, and self-hosted models
  • Routes requests with fallbacks, load balancing, and rate limiting
  • Delivers results through proxy or SDK integration with spend tracking

SKILL.md

.github/skills/litellmView on GitHub ↗
---
name: litellm
description: >-
  Call 100+ LLM APIs with one interface using LiteLLM — unified API proxy for
  OpenAI, Anthropic, Google, Mistral, Cohere, and self-hosted models. Use when
  someone asks to "switch between LLM providers", "LiteLLM", "unified LLM API",
  "LLM proxy", "call Claude and GPT with the same code", "LLM load balancing",
  or "multi-model AI gateway". Covers provider routing, fallbacks, rate limiting,
  spend tracking, and self-hosted proxy.
license: Apache-2.0
compatibility: "Python. Node.js via OpenAI SDK (proxy mode). Self-hostable."
metadata:
  author: terminal-skills
  version: "1.0.0"
  category: data-ai
  tags: ["llm", "proxy", "litellm", "gateway", "multi-model"]
---

# LiteLLM

## Overview

LiteLLM provides a single API to call 100+ LLM providers — OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Azure, Bedrock, Ollama, and more. Write your code once using the OpenAI SDK format, then switch providers by changing a model string. As a proxy server, it adds load balancing, fallbacks, rate limiting, spend tracking, and API key management for teams.

## When to Use

- Using multiple LLM providers and want a unified interface
- Need automatic fallbacks (if Claude is down, use GPT)
- Cost tracking across multiple providers and teams
- Load balancing requests across multiple API keys or models
- Self-hosted proxy to manage LLM access for a team

## Instructions

### Setup

```bash
pip install litellm

# Or run as proxy server
pip install 'litellm[proxy]'
```

### SDK Usage (Python)

```python
# llm.py — Call any LLM with the same interface
from litellm import completion

# OpenAI
response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Anthropic — same interface, just change the model string
response = completion(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Google Gemini
response = completion(
    model="gemini/gemini-2.0-flash",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Local Ollama
response = completion(
    model="ollama/llama3",
    messages=[{"role": "user", "content": "Hello!"}],
    api_base="http://localhost:11434",
)

# All return the same response format (OpenAI-compatible)
print(response.choices[0].message.content)
```

### Proxy Server

```yaml
# litellm_config.yaml — Proxy configuration
model_list:
  - model_name: "fast"
    litellm_params:
      model: gpt-4o-mini
      api_key: sk-...

  - model_name: "smart"
    litellm_params:
      model: claude-sonnet-4-20250514
      api_key: sk-ant-...

  - model_name: "smart"  # Second "smart" model = load balancing
    litellm_params:
      model: gpt-4o
      api_key: sk-...

  - model_name: "cheap"
    litellm_params:
      model: gemini/gemini-2.0-flash
      api_key: AIza...

router_settings:
  routing_strategy: "latency-based-routing"
  num_retries: 3
  timeout: 30
  fallbacks: [{"smart": ["fast"]}]  # If smart fails, use fast

general_settings:
  master_key: "sk-master-key-xxx"  # Admin key
```

```bash
# Start proxy
litellm --config litellm_config.yaml --port 4000

# Call via OpenAI SDK (any language!)
curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-master-key-xxx" \
  -d '{"model": "smart", "messages": [{"role": "user", "content": "Hello"}]}'
```

### Node.js via Proxy

```typescript
// app.ts — Use any OpenAI SDK client with LiteLLM proxy
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:4000/v1",
  apiKey: "sk-master-key-xxx",
});

// Calls route to Claude or GPT based on load balancing config
const response = await client.chat.completions.create({
  model: "smart",
  messages: [{ role: "user", content: "Explain monads simply." }],
});
```

### Spend Tracking

```python
# Track costs per team/user/project
from litellm import completion

response = completion(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    metadata={
        "user": "user-123",
        "team": "engineering",
        "project": "chatbot",
    },
)

# LiteLLM proxy stores costs in its database
# Query via API: GET /spend/logs?user=user-123
```

## Examples

### Example 1: Multi-provider AI application

**User prompt:** "My app uses Claude for reasoning and GPT-4o for function calling. Set up a unified interface."

The agent will configure LiteLLM with named model groups, route by capability, and add fallbacks between providers.

### Example 2: Team LLM gateway with cost controls

**User prompt:** "Set up an LLM proxy for our team with per-user rate limits and spend tracking."

The agent will deploy the LiteLLM proxy, configure API keys per team member, set rate limits and budget caps, and enable spend logging.

## Guidelines

- **Model format: `provider/model`** — `anthropic/claude-sonnet-4-20250514`, `gemini/gemini-2.0-flash`
- **Proxy for teams** — centralize API keys, track spend, enforce rate limits
- **Fallbacks for reliability** — if primary model fails, route to backup
- **Load balancing** — multiple entries with same `model_name` distribute traffic
- **Latency-based routing** — LiteLLM picks the fastest responding provider
- **Spend tracking** — costs calculated per-request, queryable via API
- **OpenAI SDK compatible** — any OpenAI client library works with the proxy
- **Streaming works** — `stream=True` works across all providers
- **Environment variables** — `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` etc. auto-detected

More from TerminalSkills/skills