nemotron-voice-agent-deploy

Name: nemotron-voice-agent-deploy
Author: NVIDIA/skills

$npx mdskill add NVIDIA/skills/nemotron-voice-agent-deploy

Deploys real-time voice agents on x86, Jetson, or Cloud NIMs.

Enables speech-to-speech conversations on edge or cloud hardware.
Integrates NVIDIA ASR, TTS, LLM, and WebRTC/WebSocket services.
Selects deployment target by detecting available GPU hardware.
Delivers audio streams via WebRTC or WebSocket interfaces.

SKILL.md

.github/skills/nemotron-voice-agent-deployView on GitHub ↗

---
name: nemotron-voice-agent-deploy
description: Deploy Nemotron Voice Agent on Workstation (x86), Jetson Thor, or Cloud NIMs. Real-time speech-to-speech using NVIDIA ASR, TTS, LLM with WebRTC/WebSocket transport.
---

# Nemotron Voice Agent Deployment

Real-time conversational AI voice agent using NVIDIA NIMs (ASR, TTS, LLM) with WebRTC (default) or WebSocket transport.

## Deployment Flow

**Always verify hardware first, even if user mentions a specific platform.**

### STEP 1: Hardware Detection

```bash
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader 2>/dev/null
```

| Result | Action |
|--------|--------|
| Command fails / No output | → **Cloud NIMs** |
| GPU detected | → **STEP 2: Platform Detection** |

---

### Cloud NIMs (No GPU)

```bash
cd nemotron-voice-agent
git submodule update --init
cp config/env.example .env
```

Export your NVIDIA API key:
```bash
export NVIDIA_API_KEY=your-api-key  # Get from https://build.nvidia.com
```

Then edit `.env`:
```bash
NVIDIA_LLM_MODEL=nvidia/nemotron-3-nano-30b-a3b  # Cloud model name
```

**If user requests WebSocket transport**, also add to `.env`:
```bash
TRANSPORT=WEBSOCKET
```

```bash
docker compose up --build --no-deps -d python-app ui-app
# WebRTC: http://localhost:9000
# WebSocket: http://localhost:7860/static/index.html
```

> **Note:** Deployment may take 30-60 minutes on first run.

**If user requests Multilingual mode**, also add to `.env`:
```bash
ENABLE_MULTILINGUAL=true
ASR_CLOUD_FUNCTION_ID=71203149-d3b7-4460-8231-1be2543a1fca
ASR_MODEL_NAME=parakeet-rnnt-1.1b-unified-ml-cs-universal-multi-asr-streaming
```

**Remote Access:** `ssh -L 9000:localhost:9000 user@host` or `http://<HOST_IP>:9000`

---

### STEP 2: Platform Detection (if GPU detected)

```bash
uname -m  # x86_64 → Workstation, aarch64 → Jetson
cat /etc/nv_tegra_release 2>/dev/null && echo "Jetson"
```

| Platform | Reference | Requirements |
|----------|-----------|--------------|
| Workstation (x86_64) | [workstation-deployment.md](references/workstation-deployment.md) | 2x GPU (24GB+ VRAM), NIM containers |
| Jetson Thor (aarch64) | [jetson-deployment.md](references/jetson-deployment.md) | JetPack 7.0, Nemotron Speech ASR and TTS, vLLM |

> **Note:** Multilingual mode available on Workstation with WebRTC transport only.

More from NVIDIA/skills

Skill	Description
accessing-mlflow	Query and browse evaluation results stored in MLflow. Use when the user wants to look up runs by invocation ID, compare metrics across models, fetch artifacts (configs, logs, results), or set up the MLflow MCP server. ALWAYS triggers on mentions of MLflow, experiment results, run comparison, invocation IDs in the context of results, or MLflow MCP setup.
ad-add-fusion-transformation	>
ad-conf-check	>
ad-graph-dump	>
ad-model-onboard	>
ad-pipeline-failure-pr	>
add-benchmark	>
aiq-deploy	\|
aiq-research	\|
byob	Create custom LLM evaluation benchmarks using the BYOB decorator framework. Use when the user wants to (1) create a new benchmark from a dataset, (2) pick or write a scorer, (3) compile and run a BYOB benchmark, (4) containerize a benchmark, or (5) use LLM-as-Judge evaluation. Triggers on mentions of BYOB, custom benchmark, bring your own benchmark, scorer, or benchmark compilation.