groq

$npx mdskill add vm0-ai/vm0-skills/groq

Execute ultra-low latency LLM inference with OpenAI-compatible APIs.

  • Delivers chat completions and audio transcription at record speeds.
  • Integrates with Groq's LPU hardware and OpenAI-compatible endpoints.
  • Selects models based on user requests for Llama, Mixtral, or Gemma.
  • Returns results through standard chat completion interfaces.

SKILL.md

.github/skills/groqView on GitHub ↗
---
name: groq
description: >
  Groq ultra-fast LLM inference (LPU engine). Use when the user mentions
  "Groq", "llama on Groq", or asks for fast inference with Llama, Mixtral,
  or Gemma models.
---

# Groq

Groq provides ultra-fast LLM inference using its custom LPU (Language Processing Unit) hardware. The API is fully OpenAI-compatible, so any workflow that works against `api.openai.com` can be pointed at `api.groq.com/openai/v1` with minimal changes.

> Official docs: `https://console.groq.com/docs/overview`

---

## When to Use

Use this skill when you need to:

- Run chat completions at extremely low latency (Groq LPU is significantly faster than GPU-based inference)
- Use open-weight models such as Llama 3.3 70B, Llama 3.1 8B, Mixtral 8x7B, or Gemma 2 9B
- Transcribe audio using Whisper via an OpenAI-compatible endpoint
- List available models on Groq's platform
- Drop in a fast, cost-effective inference backend where OpenAI compatibility is assumed

---

## Prerequisites

Connect the **Groq** connector at [app.vm0.ai/connectors](https://app.vm0.ai/connectors).

> **Troubleshooting:** If requests fail, run `zero doctor check-connector --env-name GROQ_TOKEN` or `zero doctor check-connector --url https://api.groq.com/openai/v1/models --method GET`

---

## How to Use

All examples below assume you have `GROQ_TOKEN` set via the Groq connector.

Base URL: `https://api.groq.com/openai/v1`

### 1. Basic Chat Completion

Write to `/tmp/groq_request.json`:

```json
{
  "model": "llama-3.3-70b-versatile",
  "messages": [{"role": "user", "content": "Explain LPU inference in one paragraph."}]
}
```

Then run:

```bash
curl -s "https://api.groq.com/openai/v1/chat/completions" --header "Content-Type: application/json" --header "Authorization: Bearer $GROQ_TOKEN" -d @/tmp/groq_request.json | jq '.choices[0].message.content'
```

**Key models:**

- `llama-3.3-70b-versatile` — Llama 3.3 70B; best quality, still very fast
- `llama-3.1-8b-instant` — Llama 3.1 8B; lowest latency
- `mixtral-8x7b-32768` — Mixtral 8x7B with 32K context
- `gemma2-9b-it` — Google Gemma 2 9B instruct
- `whisper-large-v3` — Audio transcription (see section 4)

### 2. Chat Completion with System Prompt

Write to `/tmp/groq_request.json`:

```json
{
  "model": "llama-3.3-70b-versatile",
  "messages": [
    {"role": "system", "content": "You are a concise technical assistant. Reply in JSON only."},
    {"role": "user", "content": "List the top 3 use cases for LPU-based inference."}
  ]
}
```

Then run:

```bash
curl -s "https://api.groq.com/openai/v1/chat/completions" --header "Content-Type: application/json" --header "Authorization: Bearer $GROQ_TOKEN" -d @/tmp/groq_request.json | jq '.choices[0].message.content'
```

### 3. Streaming Chat Completion

For real-time token output, set `stream: true`. The response is Server-Sent Events (SSE).

Write to `/tmp/groq_request.json`:

```json
{
  "model": "llama-3.1-8b-instant",
  "messages": [{"role": "user", "content": "Write a haiku about fast inference."}],
  "stream": true
}
```

Then run:

```bash
curl -s "https://api.groq.com/openai/v1/chat/completions" --header "Content-Type: application/json" --header "Authorization: Bearer $GROQ_TOKEN" -d @/tmp/groq_request.json
```

Each SSE chunk is a `data: {...}` line with a `delta.content` field. The stream ends with `data: [DONE]`.

### 4. Audio Transcription (Whisper)

Transcribe audio files using Groq's hosted `whisper-large-v3` model.

```bash
curl -s "https://api.groq.com/openai/v1/audio/transcriptions" --header "Authorization: Bearer $GROQ_TOKEN" -F "file=@audio.mp3" -F "model=whisper-large-v3" | jq '.text'
```

Supported formats: `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm` (max 25 MB per file).

> **Note:** The `vtt` and `srt` response formats are not supported by Groq — use the default `json` format.

### 5. List Available Models

Retrieve all models currently available on the Groq platform:

```bash
curl -s "https://api.groq.com/openai/v1/models" --header "Authorization: Bearer $GROQ_TOKEN" | jq -r '.data[].id' | sort
```

### 6. Check Token Usage

Token usage is returned in every non-streaming chat completion response:

Write to `/tmp/groq_request.json`:

```json
{
  "model": "llama-3.1-8b-instant",
  "messages": [{"role": "user", "content": "Hi!"}]
}
```

Then run:

```bash
curl -s "https://api.groq.com/openai/v1/chat/completions" --header "Content-Type: application/json" --header "Authorization: Bearer $GROQ_TOKEN" -d @/tmp/groq_request.json | jq '.usage'
```

Response includes `prompt_tokens`, `completion_tokens`, and `total_tokens`.

---

## Guidelines

1. **Pick the right model for the job**: Use `llama-3.3-70b-versatile` for quality-first tasks; use `llama-3.1-8b-instant` when latency or cost is the priority
2. **Groq is OpenAI-compatible**: Any skill or code targeting `api.openai.com/v1` can be redirected to `api.groq.com/openai/v1` — just swap the base URL and API key
3. **Streaming is the default expectation on Groq**: Because inference is so fast, streaming responses are common; handle SSE with a plain curl call and parse `data:` lines
4. **Watch unsupported parameters**: `logprobs`, `logit_bias`, `top_logprobs`, and `messages[].name` are not supported; omit them to avoid errors
5. **Temperature 0 is not allowed**: If you need deterministic output, use a very small positive value such as `0.01`
6. **Audio format restrictions**: Whisper on Groq does not support `vtt` or `srt` output formats; use the default `json` response format
7. **Check the model list regularly**: Groq adds and retires models; always confirm the model ID with `GET /openai/v1/models` before coding against a specific model

More from vm0-ai/vm0-skills

SkillDescription
account-reconciliationPerform account reconciliations comparing general ledger balances against subledgers, bank statements, or external records. Use for bank reconciliation, GL-to-subledger reconciliation, intercompany reconciliation, balance sheet reconciliation, reconciling item analysis, outstanding item aging, or clearing open items.
agentphoneBuild AI phone agents with AgentPhone API. Use when the user wants to make phone calls, send/receive SMS, manage phone numbers, create voice agents, set up webhooks, or check usage — anything related to telephony, phone numbers, or voice AI.
ahrefsAhrefs SEO API for backlink and keyword analysis. Use when user mentions
amplitudeAmplitude product analytics API. Use when user mentions "Amplitude",
analysis-qaQuality-check a data analysis before sharing — verify joins, aggregations, denominators, time ranges, and metric definitions. Detect pitfalls like survivorship bias, average-of-averages, join explosion, timezone mismatches, incomplete periods, and selection bias. Includes documentation templates for reproducible analyses.
anthropic-managed-agentsAnthropic Managed Agents API for programmatically creating, running, and streaming AI agents on Anthropic's cloud infrastructure. Use when the user mentions "Managed Agents", "Anthropic agent sessions", or needs to create/run/stream an Anthropic agent with tool use (bash, git, web), attach GitHub repositories, or inject secrets via Vault. Do NOT use for standard Claude Messages API — use the Claude API skill instead.
apifyApify web scraping platform. Use when user mentions "scrape website",
asanaAsana API for tasks and projects. Use when user mentions "Asana", "asana.com",
atlassianAtlassian API for Confluence and Jira. Use when user mentions "Confluence
attioAttio REST API for AI-native CRM operations — manage companies, people, deals, and custom objects, plus notes, tasks, lists, and comments. Use when the user mentions "Attio", "CRM record", "create company", "add person", "list entry", "CRM note", or "CRM task".