vss-frag
$
npx mdskill add NVIDIA-AI-Blueprints/video-search-and-summarization/vss-fragGenerate video analysis reports with enterprise knowledge retrieval.
- Creates detailed summaries from long videos using advanced retrieval.
- Integrates Milvus vector database for enterprise knowledge access.
- Collects parameters through human-in-the-loop interaction.
- Delivers structured reports directly to the user interface.
SKILL.md
.github/skills/vss-fragView on GitHub ↗
---
name: vss-frag
description: "Generate video summary reports using the VSS video_search_frag extension with Long Video Summarization (LVS), Enterprise RAG knowledge retrieval, and human-in-the-loop parameter collection. Use when: user wants to generate a video summary, report, or analysis using the frag pipeline."
license: Apache-2.0
metadata:
version: "3.1.0"
github-url: "https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization"
tags: "nvidia blueprint operational"
---
# VSS Frag — Video Analysis with Enterprise RAG
Generate video summary reports using the VSS `video_search_frag` extension.
This skill adds Enterprise RAG (Milvus) knowledge retrieval and guided
human-in-the-loop (HITL) parameter collection on top of the base VSS agent.
Always run `curl` commands yourself; never instruct the user to run them.
## Deploying the Frag Extension
The frag extension layers Enterprise RAG and HITL LVS tools on top of the base
VSS agent image. Deployment is a two-step Docker build followed by compose up.
> **Environment variables:** All commands use values from the `.env` file at
> `deployments/developer-workflow/dev-profile-lvs/.env`. Edit it before deploying.
> Key variables: `HOST_IP`, `VSS_AGENT_PORT` (default `8000`), `NGC_CLI_API_KEY`,
> `NVIDIA_API_KEY`, `ENTERPRISE_RAG_*`.
### Step 1: Configure the .env file
```bash
nano deployments/developer-workflow/dev-profile-lvs/.env
```
Set at minimum:
- `HOST_IP` — your machine's IP (`hostname -I | awk '{print $1}'`)
- `NGC_CLI_API_KEY` — from https://ngc.nvidia.com/
- `NVIDIA_API_KEY` — from https://build.nvidia.com/
- `VSS_AGENT_CONFIG_FILE=./configs/video_search_frag/config.yml`
- `ENTERPRISE_RAG_VDB_ENDPOINT` — your Milvus endpoint (e.g., `tcp://127.0.0.1:19530`)
- `ENTERPRISE_RAG_COLLECTION_NAMES` — your Milvus collection name
### Step 2: Log in to NGC registry
```bash
echo "$NGC_CLI_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
```
### Step 3: Build the base agent image
```bash
cd agent
docker build -f docker/Dockerfile -t vss-agent-base .
```
### Step 4: Build the frag extension image
```bash
docker compose \
-f app/video_search_frag/docker-compose.yml \
--env-file ../deployments/developer-workflow/dev-profile-lvs/.env \
build
```
This produces `vss-agent-frag:latest` — the base agent extended with
`video_search_frag` (Enterprise RAG, HITL LVS, PDF report generation).
### Step 5: Deploy with docker compose
```bash
docker compose \
-f app/video_search_frag/docker-compose.yml \
-f ../deployments/agents/agent_ui/compose.yml \
--env-file ../deployments/developer-workflow/dev-profile-lvs/.env \
--profile bp_developer_lvs_2d \
up -d
```
Two `-f` flags: the frag compose defines `vss-agent`, the UI compose defines
`metropolis-vss-ui`. They merge into a single deployment.
### Step 6: Verify deployment
```bash
# Check containers are running
docker ps --format "table {{.Names}}\t{{.Status}}"
# Health check
curl -sf --max-time 5 "http://${HOST_IP}:${VSS_AGENT_PORT:-8000}/health" >/dev/null \
&& echo "VSS frag agent is running" \
|| echo "VSS frag agent is NOT reachable"
```
### Tear down
```bash
docker compose \
-f app/video_search_frag/docker-compose.yml \
-f ../deployments/agents/agent_ui/compose.yml \
--env-file ../deployments/developer-workflow/dev-profile-lvs/.env \
--profile bp_developer_lvs_2d \
down
```
### Rebuild after code changes
Always `down` then rebuild and `up` — never just `up -d` alone after changes.
```bash
docker compose \
-f app/video_search_frag/docker-compose.yml \
--env-file ../deployments/developer-workflow/dev-profile-lvs/.env \
build
docker compose \
-f app/video_search_frag/docker-compose.yml \
-f ../deployments/agents/agent_ui/compose.yml \
--env-file ../deployments/developer-workflow/dev-profile-lvs/.env \
--profile bp_developer_lvs_2d \
down
docker compose \
-f app/video_search_frag/docker-compose.yml \
-f ../deployments/agents/agent_ui/compose.yml \
--env-file ../deployments/developer-workflow/dev-profile-lvs/.env \
--profile bp_developer_lvs_2d \
up -d
```
## When to Use
- User wants to generate a video summary or report using the frag pipeline
- User asks to analyze a video with Enterprise RAG knowledge context
- User mentions "frag", "enterprise RAG", or "knowledge-enhanced report"
## When NOT to Use
- Simple video understanding queries (use `video-understanding` skill)
- Direct LVS summarization without HITL (use `video-summarization` skill)
- Deployment tasks (use `deploy` skill)
- Real-time alerts (use `alerts` skill)
## Workflow: Generate an LVS Report with Enterprise RAG
### Step 1: List available videos
```bash
curl -sS -X POST "http://${HOST_IP}:${VSS_AGENT_PORT:-8000}/v1/chat" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "What videos are available?"}]}' | \
python3 -c "import json,sys; d=json.load(sys.stdin); print(d['choices'][0]['message']['content'])"
```
Show the user the video list and ask which one they want to analyze.
### Step 2: Collect parameters from the user
Ask the user for these four inputs one at a time:
1. **Scenario** — What type of scenario is the video about?
Example: "warehouse monitoring", "traffic monitoring", "retail store activity"
2. **Events** — What events should be detected? Comma-separated.
Example: "accident, forklift stuck, workers not wearing PPE, person entering restricted area"
3. **Objects of Interest** — What objects should the analysis focus on? Or "skip" to skip.
Example: "forklifts, pallets, workers"
4. **Enterprise RAG Query** — An optional question to search the enterprise knowledge base
for additional context to include in the report. Or "skip" to skip.
Example: "What are the principles of STCC?"
### Step 3: Start the report (HTTP HITL)
Send a POST to `/v1/chat`. This returns HTTP 202 with an execution_id and the first
HITL prompt. Replace VIDEO_NAME with the chosen video:
```bash
curl -sS -X POST "http://${HOST_IP}:${VSS_AGENT_PORT:-8000}/v1/chat" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Generate a report for VIDEO_NAME using long video summarization"}]}'
```
The response contains:
- `execution_id` — save this, used in all subsequent requests
- `interaction_id` — identifies the current prompt
- `prompt.text` — the HITL prompt text
- `response_url` — the URL to POST the response to
### Step 4: Respond to HITL prompts
For each prompt, POST the user's parameter to the response_url.
Replace EXECUTION_ID, INTERACTION_ID, and the text value:
```bash
curl -sS -X POST \
"http://${HOST_IP}:${VSS_AGENT_PORT:-8000}/executions/EXECUTION_ID/interactions/INTERACTION_ID/response" \
-H "Content-Type: application/json" \
-d '{"response": {"type": "text", "text": "USER_VALUE_HERE"}}'
```
Then poll for the next prompt:
```bash
curl -sS "http://${HOST_IP}:${VSS_AGENT_PORT:-8000}/executions/EXECUTION_ID" | python3 -m json.tool
```
The HITL prompts come in this order:
1. **Scenario** — respond with the scenario from Step 2
2. **Events** — respond with the events from Step 2
3. **Objects of Interest** — respond with the objects from Step 2, or "skip"
4. **Enterprise RAG Query** — respond with the query from Step 2, or "skip"
5. **Confirmation** — respond with empty string "" to confirm and start processing
Repeat the POST-then-poll cycle for each prompt.
### Step 5: Wait for completion
After the confirmation prompt, the system processes the video. This takes 3-5 minutes.
Keep polling until the status changes from "running" to "completed":
```bash
curl -sS "http://${HOST_IP}:${VSS_AGENT_PORT:-8000}/executions/EXECUTION_ID" | python3 -m json.tool
```
Tell the user to wait — this takes 3-5 minutes. Poll every 30 seconds.
### Step 6: Present the results
When status is "completed", the response contains the full report with:
- Detected events with timestamps
- Narrative analysis summary
- Enterprise RAG context (if queried)
- PDF report download link (if available)
Present the report content to the user in a readable format.
## Quick Commands
### Health check
```bash
curl -sS "http://${HOST_IP}:${VSS_AGENT_PORT:-8000}/health"
```
### Simple chat query (non-report)
For simple questions that do NOT involve report generation:
```bash
curl -sS -X POST "http://${HOST_IP}:${VSS_AGENT_PORT:-8000}/v1/chat" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "YOUR_QUESTION_HERE"}]}' | \
python3 -c "import json,sys; d=json.load(sys.stdin); print(d['choices'][0]['message']['content'])"
```
## Notes
- LVS reports take 3-5 minutes for a ~3.5 minute video — always tell the user to wait
- Enterprise RAG requires a Milvus vector database with data ingested
- If objects or rag_query are not needed, respond with "skip"
- The HITL response format is always: `{"response": {"type": "text", "text": "value"}}`
- `enable_interactive_extensions: true` must be set in the frag config for HTTP HITL to work
- See also: `video-summarization`, `video-understanding`, `report`, `vios`, `deploy`
More from NVIDIA-AI-Blueprints/video-search-and-summarization
- alertsManage and monitor VSS alerts after the alerts profile is deployed. The deployment's mode (CV vs VLM real-time) is fixed at deploy time and determines the workflow — start/stop real-time alerts via the VSS Agent on a VLM deployment, onboard CV alerts by adding RTSP streams to VIOS on a CV deployment, query incidents, customize verifier prompts. Use when asked to start/stop a real-time alert, check or list alerts, add a camera, use a sample video for alerts, customize alert prompts, or view verdicts.
- deployDeploy, debug, or tear down any VSS profile using a compose-centric workflow — config (dry-run) with env overrides, review resolved compose, then compose up. Use this skill when the user says "deploy vss", "deploy `profile`", "debug deploy", "verify deployment", or "why is my vss deploy broken".
- rt-vlm>
- video-analyticsQuery video analytics data and metrics from Elastic search via the VA-MCP server (port 9901). This includes incidents, alerts, sensor data, and metrics. Use for any question about violations, alerts, incidents, object counts, speeds, occupancy, or anything that requires looking up recorded events. This is the primary way to answer a question that requires incidents, alerts and other metrics such as people counts and violations.
- video-searchSearch video archives using natural language — find events, objects, actions, and people across recorded video using fusion search (Cosmos Embed1 semantic search + CV attribute search). Use when asked to search for something in video, find actions and events, locate objects and people, or query video archives. For these types of questions, default to this top-level fusion search unless user specifies otherwise. Requires the search profile to be deployed.
- video-summarizationSummarize a video by calling the VLM NIM or the Long Video Summarization (LVS) microservice directly. For short videos (under 60s) call the VLM's OpenAI-compatible chat completions endpoint; for long videos (60s or longer) call the LVS microservice. Use when asked to summarize a video, describe what happens in a video, analyze a recording, call or debug LVS summarize/model/health/recommended-config/metrics endpoints, or configure and troubleshoot the LVS service that backs long-video summarization.
- video-understandingCall the vss agent to run video understanding on video to answer a text question. Use when the user asks about video content, or about visual details that cannot be answered from conversation history, search hits, or metadata alone.
- viosQuery VIOS REST APIs: sensor list, recording timelines, video clip extraction, snapshot capture, add/delete sensors and streams