vercel-ai-sdk
$
npx mdskill add vercel-labs/py-ai/vercel-ai-sdkBuild advanced LLM applications by managing streaming, tool calling, and structured output patterns.
- Develop complex conversational flows requiring multi-step reasoning and external function calls.
- Integrates with various LLM providers via a unified gateway model abstraction.
- Manages execution flow by wrapping the core logic within a controlled runtime context.
- Streams output incrementally, yielding message deltas for real-time user feedback.
SKILL.md
.github/skills/vercel-ai-sdkView on GitHub ↗
---
name: vercel-ai-sdk
description: Vercel AI SDK (Python) - patterns for building LLM-powered apps with streaming, tools, hooks, and structured output
---
# Vercel AI SDK (Python)
```bash
uv add vercel-ai-sdk
```
```python
import vercel_ai_sdk as ai
```
## Core workflow
`ai.run(root, *args, checkpoint=None)` is the entry point. It creates a `Runtime` (stored in a context var), starts `root` as a background task, processes an internal step queue, and yields `Message` objects. All SDK functions (`stream_step`, `execute_tool`, hooks) require this Runtime context -- they must be called within `ai.run()`.
The root function is any async function. If it declares a param typed `ai.Runtime`, it's auto-injected.
```python
@ai.tool
async def talk_to_mothership(question: str) -> str:
"""Contact the mothership for important decisions."""
return "Soon."
async def agent(llm: ai.LanguageModel, query: str) -> ai.StreamResult:
return await ai.stream_loop(
llm,
messages=ai.make_messages(system="You are a robot assistant.", user=query),
tools=[talk_to_mothership],
)
llm = ai.ai_gateway.GatewayModel(model="anthropic/claude-opus-4.6")
async for msg in ai.run(agent, llm, "When will the robots take over?"):
print(msg.text_delta, end="")
```
**`@ai.tool`** turns an async function into a `Tool`. Schema is extracted from type hints + docstring. If a tool declares `runtime: ai.Runtime`, it's auto-injected (excluded from LLM schema). Tools are registered globally by name.
**`ai.stream_step(llm, messages, tools=None, label=None, output_type=None)`** -- single LLM call. Returns `StreamResult` with `.text`, `.tool_calls`, `.output`, `.usage`, `.last_message`.
**`ai.stream_loop(llm, messages, tools, label=None, output_type=None)`** -- agent loop: calls LLM → executes tools → repeats until no tool calls. Returns final `StreamResult`.
Both are thin convenience wrappers (not magical -- they could be reimplemented by the user). `stream_step` is a `@ai.stream`-decorated function that calls `llm.stream()`. `stream_loop` calls `stream_step` in a while loop with `ai.execute_tool()` between iterations.
**`ai.execute_tool(tool_call, message=None)`** runs a tool call by name from the global registry. Handles malformed JSON / invalid args gracefully -- reports as a tool error so the LLM can retry rather than crashing.
### Multi-agent
Use `asyncio.gather` with labels to run agents in parallel:
```python
async def multi(llm: ai.LanguageModel, query: str) -> ai.StreamResult:
r1, r2 = await asyncio.gather(
ai.stream_loop(llm, msgs1, tools=[t1], label="researcher"),
ai.stream_loop(llm, msgs2, tools=[t2], label="analyst"),
)
return await ai.stream_loop(
llm,
ai.make_messages(user=f"{r1.text}\n{r2.text}"),
tools=[],
label="summary",
)
```
The `label` field on messages lets the consumer distinguish which agent produced output (e.g. `msg.label == "researcher"`).
### Messages
`ai.make_messages(system=None, user=str)` builds a message list.
`Message` is a Pydantic model with `role`, `parts` (list of `TextPart | ToolPart | ReasoningPart | HookPart | StructuredOutputPart`), `label`, and `usage`. Serialize with `msg.model_dump()`, restore with `ai.Message.model_validate(data)`.
Key properties for consuming streamed output:
- `msg.text_delta` -- current text chunk (use for live streaming display)
- `msg.text` -- full accumulated text
- `msg.tool_calls` -- list of `ToolPart` objects
- `msg.output` -- validated Pydantic instance (when using `output_type`)
- `msg.is_done` -- true when all parts finished streaming
- `msg.get_hook_part()` -- find a hook suspension part (for human-in-the-loop)
## Customization
### Custom loop
When `stream_loop` doesn't fit (conditional tool execution, approval gates, custom routing), use `stream_step` in a manual loop:
```python
async def agent(llm: ai.LanguageModel, query: str) -> ai.StreamResult:
messages = ai.make_messages(system="...", user=query)
tools = [get_weather, get_population]
while True:
result = await ai.stream_step(llm, messages, tools)
if not result.tool_calls:
return result
messages.append(result.last_message)
await asyncio.gather(*(ai.execute_tool(tc, message=result.last_message) for tc in result.tool_calls))
```
### Custom stream
`@ai.stream` wires an async generator (yielding `Message`) into the Runtime's step queue. This is what makes streaming visible to `ai.run()` and enables checkpoint replay -- calling `llm.stream()` directly would bypass both.
```python
@ai.stream
async def custom_step(llm: ai.LanguageModel, messages: list[ai.Message]) -> AsyncGenerator[ai.Message]:
async for msg in llm.stream(messages=messages, tools=[...]):
msg.label = "custom"
yield msg
result = await custom_step(llm, messages) # returns StreamResult
```
Tools can also stream intermediate progress via `runtime.put_message()`:
```python
@ai.tool
async def long_task(input: str, runtime: ai.Runtime) -> str:
"""Streams progress back to the caller."""
for step in ["Connecting...", "Processing..."]:
await runtime.put_message(
ai.Message(role="assistant", parts=[ai.TextPart(text=step, state="streaming")], label="progress")
)
return "final result"
```
## Hooks
Hooks are typed suspension points for human-in-the-loop. Decorate a Pydantic model to define the resolution schema:
```python
@ai.hook
class Approval(pydantic.BaseModel):
cancels_future: ClassVar[bool] = True # cancel on suspend (serverless)
granted: bool
reason: str
```
Inside agent code -- blocks until resolved:
```python
approval = await Approval.create("approve_send_email", metadata={"tool": "send_email"})
if approval.granted:
await ai.execute_tool(tc, message=result.last_message)
else:
tc.set_error(f"Rejected: {approval.reason}")
```
From outside (API handler, iterator loop):
```python
Approval.resolve("approve_send_email", {"granted": True, "reason": "User approved"})
Approval.cancel("approve_send_email")
```
**Long-running mode** (`cancels_future=False`, default): `create()` blocks until `resolve()` or `cancel()` is called externally. Use for websocket/interactive UIs.
**Serverless mode** (`cancels_future=True`): unresolved hooks are cancelled, the run ends. Inspect `result.pending_hooks` and `result.checkpoint` to resume later.
Consuming hooks in the iterator:
```python
async for msg in ai.run(agent, llm, query):
if (hook := msg.get_hook_part()) and hook.status == "pending":
answer = input(f"Approve {hook.hook_id}? [y/n] ")
Approval.resolve(hook.hook_id, {"granted": answer == "y", "reason": "operator"})
continue
print(msg.text_delta, end="")
```
### Checkpoints
`Checkpoint` records completed steps (LLM calls), tool executions, and hook resolutions. On replay, cached results are returned without re-executing.
```python
data = result.checkpoint.model_dump() # serialize (JSON-safe dict)
checkpoint = ai.Checkpoint.model_validate(data) # restore
result = ai.run(agent, llm, query, checkpoint=checkpoint) # replay completed work
```
Primary use case is serverless hook re-entry.
## Adapters
### Providers
```python
# Vercel AI Gateway (recommended)
# Uses AI_GATEWAY_API_KEY env var
llm = ai.ai_gateway.GatewayModel(model="anthropic/claude-opus-4.6", thinking=True, budget_tokens=10000)
# Direct
llm = ai.openai.OpenAIModel(model="gpt-5")
llm = ai.anthropic.AnthropicModel(model="claude-opus-4-6", thinking=True, budget_tokens=10000)
```
All implement `LanguageModel` with `stream()` (async generator of `Message`) and `buffer()` (returns final `Message`). Gateway routes Anthropic models through the native Anthropic API for full feature support, others through OpenAI-compatible endpoint.
### AI SDK UI
For streaming to AI SDK frontend (`useChat`, etc.):
```python
from vercel_ai_sdk.ai_sdk_ui import to_sse_stream, to_messages, UI_MESSAGE_STREAM_HEADERS
messages = to_messages(request.messages)
return StreamingResponse(to_sse_stream(ai.run(agent, llm, query)), headers=UI_MESSAGE_STREAM_HEADERS)
```
## Other features
### Structured output
Pass a Pydantic model as `output_type`:
```python
class Forecast(pydantic.BaseModel):
city: str
temperature: float
result = await ai.stream_step(llm, messages, output_type=Forecast)
result.output.city # validated Pydantic instance
# Also works directly on the model:
msg = await llm.buffer(messages, output_type=Forecast)
```
### MCP
```python
tools = await ai.mcp.get_http_tools("https://mcp.example.com/mcp", headers={...}, tool_prefix="docs")
tools = await ai.mcp.get_stdio_tools("npx", "-y", "@anthropic/mcp-server-filesystem", "/tmp", tool_prefix="fs")
```
Returns `Tool` objects usable in `stream_step`/`stream_loop`. Connections are pooled per `ai.run()` and cleaned up automatically.
### Telemetry
```python
ai.telemetry.enable() # OTel-based, emits gen_ai.* spans for runs/steps/tools
```
More from vercel-labs/py-ai