together

$npx mdskill add vm0-ai/vm0-skills/together

Run open-source LLMs and FLUX models via OpenAI-compatible API

  • Executes inference on Llama, Qwen, Mixtral, and FLUX models
  • Depends on Together AI API and OpenAI-compatible SDKs
  • Selects models based on user intent and capability requirements
  • Returns structured text, images, or embeddings as JSON
SKILL.md
.github/skills/togetherView on GitHub ↗
---
name: together
description: Together AI API for open-source model inference and fine-tuning. Use
  when the user mentions "Together AI", "Together", or wants to run open-source models
  (Llama, Mixtral, Qwen, FLUX) via an OpenAI-compatible API.
---

# Together AI

Together AI is a cloud platform for running open-source foundation models. Its API
is OpenAI-compatible, so any SDK or workflow built for OpenAI's `/v1/` endpoints
works with Together AI by changing the base URL and API key.

> Official docs: `https://docs.together.ai/reference`

---

## When to Use

Use this skill when you need to:

- Run open-source LLMs (Llama 3, Qwen, Mixtral, DeepSeek, etc.) via API
- Generate images with FLUX.1-schnell or FLUX.1-dev
- Create text embeddings with open-source embedding models
- Fine-tune a model on custom data
- List all available models on the Together AI platform

---

## Prerequisites

Connect the **Together AI** connector at [app.vm0.ai/connectors](https://app.vm0.ai/connectors).

> **Troubleshooting:** If requests fail, run `zero doctor check-connector --env-name TOGETHER_TOKEN` or `zero doctor check-connector --url https://api.together.ai/v1/models --method GET`

---

## How to Use

### 1. Chat Completion (OpenAI-compatible)

Write to `/tmp/together_chat.json`:

```json
{
  "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  "messages": [{"role": "user", "content": "Explain quantum entanglement in one paragraph."}],
  "max_tokens": 512
}
```

Then run:

```bash
curl -s "https://api.together.ai/v1/chat/completions" --header "Content-Type: application/json" --header "Authorization: Bearer $TOGETHER_TOKEN" -d @/tmp/together_chat.json | jq '.choices[0].message.content'
```

**Popular chat models:**

- `meta-llama/Llama-3.3-70B-Instruct-Turbo` — Fast Llama 3.3 70B
- `meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo` — Llama 3.1 405B, most capable
- `Qwen/Qwen2.5-72B-Instruct-Turbo` — Qwen 2.5 72B
- `mistralai/Mixtral-8x22B-Instruct-v0.1` — Mixtral 8x22B
- `deepseek-ai/DeepSeek-V3` — DeepSeek V3

### 2. Chat with System Prompt

Write to `/tmp/together_chat.json`:

```json
{
  "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  "messages": [
    {"role": "system", "content": "You are a concise technical assistant. Respond in JSON."},
    {"role": "user", "content": "List three uses of embeddings in NLP."}
  ],
  "max_tokens": 256
}
```

Then run:

```bash
curl -s "https://api.together.ai/v1/chat/completions" --header "Content-Type: application/json" --header "Authorization: Bearer $TOGETHER_TOKEN" -d @/tmp/together_chat.json | jq '.choices[0].message.content'
```

### 3. Text Completion

Write to `/tmp/together_completion.json`:

```json
{
  "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  "prompt": "The capital of France is",
  "max_tokens": 64,
  "stop": ["\n"]
}
```

Then run:

```bash
curl -s "https://api.together.ai/v1/completions" --header "Content-Type: application/json" --header "Authorization: Bearer $TOGETHER_TOKEN" -d @/tmp/together_completion.json | jq '.choices[0].text'
```

### 4. Image Generation (FLUX)

Write to `/tmp/together_image.json`:

```json
{
  "model": "black-forest-labs/FLUX.1-schnell",
  "prompt": "A photorealistic mountain lake at sunset, golden light reflecting on water",
  "width": 1024,
  "height": 768,
  "steps": 4,
  "n": 1
}
```

Then run:

```bash
curl -s "https://api.together.ai/v1/images/generations" --header "Content-Type: application/json" --header "Authorization: Bearer $TOGETHER_TOKEN" -d @/tmp/together_image.json | jq '.data[0].url'
```

**Image models:**

- `black-forest-labs/FLUX.1-schnell` — Fast, 4 steps, free tier
- `black-forest-labs/FLUX.1-dev` — Higher quality, 20–50 steps

### 5. Embeddings

Write to `/tmp/together_embed.json`:

```json
{
  "model": "togethercomputer/m2-bert-80M-8k-retrieval",
  "input": "The quick brown fox jumps over the lazy dog"
}
```

Then run:

```bash
curl -s "https://api.together.ai/v1/embeddings" --header "Content-Type: application/json" --header "Authorization: Bearer $TOGETHER_TOKEN" -d @/tmp/together_embed.json | jq '.data[0].embedding[:5]'
```

**Embedding models:**

- `togethercomputer/m2-bert-80M-8k-retrieval` — 8K context, retrieval-optimized
- `BAAI/bge-large-en-v1.5` — BGE large English embeddings
- `WhereIsAI/UAE-Large-V1` — UAE-Large, general-purpose

### 6. List Available Models

```bash
curl -s "https://api.together.ai/v1/models" --header "Authorization: Bearer $TOGETHER_TOKEN" | jq '[.[] | {id: .id, type: .type}] | .[:20]'
```

Filter by type (chat, language, image, embedding, code):

```bash
curl -s "https://api.together.ai/v1/models" --header "Authorization: Bearer $TOGETHER_TOKEN" | jq '[.[] | select(.type == "chat") | .id]'
```

### 7. Start a Fine-Tuning Job

Upload a JSONL training file first. Replace `<file-id>` with the file ID returned by the upload step.

Write to `/tmp/together_finetune.json`:

```json
{
  "training_file": "<file-id>",
  "model": "meta-llama/Llama-3.2-3B-Instruct-Reference",
  "n_epochs": 3,
  "learning_rate": 0.00005,
  "suffix": "my-custom-model"
}
```

Then run:

```bash
curl -s -X POST "https://api.together.ai/v1/fine-tunes" --header "Content-Type: application/json" --header "Authorization: Bearer $TOGETHER_TOKEN" -d @/tmp/together_finetune.json | jq '{id: .id, status: .status}'
```

Check fine-tune job status (replace `<fine-tune-id>` with the ID from the response above):

```bash
curl -s "https://api.together.ai/v1/fine-tunes/<fine-tune-id>" --header "Authorization: Bearer $TOGETHER_TOKEN" | jq '{id: .id, status: .status, model_output_name: .model_output_name}'
```

### 8. Streaming Response

Write to `/tmp/together_stream.json`:

```json
{
  "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
  "messages": [{"role": "user", "content": "Write a haiku about open-source AI."}],
  "stream": true,
  "max_tokens": 128
}
```

Then run:

```bash
curl -s "https://api.together.ai/v1/chat/completions" --header "Content-Type: application/json" --header "Authorization: Bearer $TOGETHER_TOKEN" -d @/tmp/together_stream.json
```

Streaming returns Server-Sent Events with delta chunks.

---

## Guidelines

1. **OpenAI-compatible**: Together AI follows the OpenAI `/v1/` schema — `model`, `messages`, `max_tokens`, `temperature`, `stream`, and `tools` all work as expected
2. **Model IDs are `org/model-name` format**: always include the organization prefix (e.g., `meta-llama/Llama-3.3-70B-Instruct-Turbo`), not just the model name
3. **FLUX image steps**: `FLUX.1-schnell` needs only 4 steps; `FLUX.1-dev` needs 20–50 for best quality
4. **Rate limits**: free-tier accounts have lower rate limits; check `x-ratelimit-*` response headers if you hit 429 errors
5. **Fine-tuning base models**: use `-Reference` or `-Free` variants (e.g., `meta-llama/Llama-3.2-3B-Instruct-Reference`) which are designated for fine-tuning
More from vm0-ai/vm0-skills