azure-aigateway
$
npx mdskill add microsoft/GitHub-Copilot-for-Azure/azure-aigatewayGovern AI models and agents through Azure API Management.
- Enforces semantic caching, token limits, and content safety rules.
- Integrates Azure OpenAI, AI Foundry, and MCP tools for governance.
- Applies LLM policies and jailbreak detection to control behavior.
- Delivers token metrics and cost control via Azure CLI configuration.
SKILL.md
.github/skills/azure-aigatewayView on GitHub ↗
---
name: azure-aigateway
description: "Configure Azure API Management as an AI Gateway for AI models, MCP tools, and agents. WHEN: semantic caching, token limit, content safety, load balancing, AI model governance, MCP rate limiting, jailbreak detection, add Azure OpenAI backend, add AI Foundry model, test AI gateway, LLM policies, configure AI backend, token metrics, AI cost control, convert API to MCP, import OpenAPI to gateway."
license: MIT
metadata:
author: Microsoft
version: "0.0.0-placeholder"
compatibility: Requires Azure CLI (az) for configuration and testing
---
# Azure AI Gateway
Configure Azure API Management (APIM) as an AI Gateway for governing AI models, MCP tools, and agents.
> **To deploy APIM**, use the **azure-prepare** skill. See [APIM deployment guide](https://learn.microsoft.com/azure/api-management/get-started-create-service-instance).
## When to Use This Skill
| Category | Triggers |
|----------|----------|
| **Model Governance** | "semantic caching", "token limits", "load balance AI", "track token usage" |
| **Tool Governance** | "rate limit MCP", "protect my tools", "configure my tool", "convert API to MCP" |
| **Agent Governance** | "content safety", "jailbreak detection", "filter harmful content" |
| **Configuration** | "add Azure OpenAI backend", "configure my model", "add AI Foundry model" |
| **Testing** | "test AI gateway", "call OpenAI through gateway" |
---
## Quick Reference
| Policy | Purpose | Details |
|--------|---------|---------|
| `azure-openai-token-limit` | Cost control | [Model Policies](references/policies.md#token-rate-limiting) |
| `azure-openai-semantic-cache-lookup/store` | 60-80% cost savings | [Model Policies](references/policies.md#semantic-caching) |
| `azure-openai-emit-token-metric` | Observability | [Model Policies](references/policies.md#token-metrics) |
| `llm-content-safety` | Safety & compliance | [Agent Policies](references/policies.md#content-safety) |
| `rate-limit-by-key` | MCP/tool protection | [Tool Policies](references/policies.md#request-rate-limiting) |
---
## Get Gateway Details
```bash
# Get gateway URL
az apim show --name <apim-name> --resource-group <rg> --query "gatewayUrl" -o tsv
# List backends (AI models)
az apim backend list --service-name <apim-name> --resource-group <rg> \
--query "[].{id:name, url:url}" -o table
# Get subscription key
az apim subscription keys list \
--service-name <apim-name> --resource-group <rg> --subscription-id <sub-id>
```
---
## Test AI Endpoint
```bash
GATEWAY_URL=$(az apim show --name <apim-name> --resource-group <rg> --query "gatewayUrl" -o tsv)
curl -X POST "${GATEWAY_URL}/openai/deployments/<deployment>/chat/completions?api-version=2024-02-01" \
-H "Content-Type: application/json" \
-H "Ocp-Apim-Subscription-Key: <key>" \
-d '{"messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100}'
```
---
## Common Tasks
### Add AI Backend
See [references/patterns.md](references/patterns.md#pattern-1-add-ai-model-backend) for full steps.
```bash
# Discover AI resources
az cognitiveservices account list --query "[?kind=='OpenAI']" -o table
# Create backend
az apim backend create --service-name <apim> --resource-group <rg> \
--backend-id openai-backend --protocol http --url "https://<aoai>.openai.azure.com/openai"
# Grant access (managed identity)
az role assignment create --assignee <apim-principal-id> \
--role "Cognitive Services User" --scope <aoai-resource-id>
```
### Apply AI Governance Policy
Recommended policy order in `<inbound>`:
1. **Authentication** - Managed identity to backend
2. **Semantic Cache Lookup** - Check cache before calling AI
3. **Token Limits** - Cost control
4. **Content Safety** - Filter harmful content
5. **Backend Selection** - Load balancing
6. **Metrics** - Token usage tracking
See [references/policies.md](references/policies.md#combining-policies) for complete example.
---
## Troubleshooting
| Issue | Solution |
|-------|----------|
| Token limit 429 | Increase `tokens-per-minute` or add load balancing |
| No cache hits | Lower `score-threshold` to 0.7 |
| Content false positives | Increase category thresholds (5-6) |
| Backend auth 401 | Grant APIM "Cognitive Services User" role |
See [references/troubleshooting.md](references/troubleshooting.md) for details.
---
## References
- [**Detailed Policies**](references/policies.md) - Full policy examples
- [**Configuration Patterns**](references/patterns.md) - Step-by-step patterns
- [**Troubleshooting**](references/troubleshooting.md) - Common issues
- [AI-Gateway Samples](https://github.com/Azure-Samples/AI-Gateway)
- [GenAI Gateway Docs](https://learn.microsoft.com/azure/api-management/genai-gateway-capabilities)
## SDK Quick References
- **Content Safety**: [Python](references/sdk/azure-ai-contentsafety-py.md) | [TypeScript](references/sdk/azure-ai-contentsafety-ts.md)
- **API Management**: [Python](references/sdk/azure-mgmt-apimanagement-py.md) | [.NET](references/sdk/azure-mgmt-apimanagement-dotnet.md)
More from microsoft/GitHub-Copilot-for-Azure
- airunway-aks-setupSet up AI Runway on AKS — from bare cluster to running model. Covers cluster verification, controller install, GPU assessment, provider setup, and first deployment. WHEN: \"setup AI Runway\", \"onboard AKS cluster\", \"install AI Runway\", \"airunway setup\", \"deploy model to AKS\", \"GPU inference on AKS\", \"KAITO setup on AKS\", \"run LLM on AKS\", \"vLLM on AKS\", \"set up model serving on AKS\", \"AI Runway controller\".
- analyze-skill-issuesQuery the integration-test storage account to find why a specific skill's tests are failing. Reads blob-stored test result files and surfaces error details. TRIGGERS: why is skill failing, skill test failures, debug skill tests, skill failing tests, analyze skill failures, why are tests failing for skill, skill test errors, investigate skill issues. DO NOT USE FOR: analyzing a GitHub Actions run report or comparing test runs across runs (use analyze-test-run).
- analyze-test-runAnalyze a GitHub Actions integration test run and produce a skill invocation report with failure root-cause issues. TRIGGERS: analyze test run, skill invocation rate, test run report, compare test runs, skill invocation summary, test failure analysis, run report, test results, action run report
- appinsights-instrumentationGuidance for instrumenting webapps with Azure Application Insights. Provides telemetry patterns, SDK setup, and configuration references. WHEN: how to instrument app, App Insights SDK, telemetry patterns, what is App Insights, Application Insights guidance, instrumentation examples, APM best practices.
- azure-aiUse for Azure AI: Search, Speech, OpenAI, Document Intelligence. Helps with search, vector/hybrid search, speech-to-text, text-to-speech, transcription, OCR. WHEN: AI Search, query search, vector search, hybrid search, semantic search, speech-to-text, text-to-speech, transcribe, OCR, convert text to speech.
- azure-cloud-migrateAssess and migrate cross-cloud workloads to Azure with reports and code conversion. Supports Lambda→Functions, Beanstalk/Heroku/App Engine→App Service, Fargate/Kubernetes/Cloud Run→Container Apps. WHEN: migrate Lambda to Functions, AWS to Azure, migrate Beanstalk, migrate Heroku, migrate App Engine, Cloud Run migration, Fargate to ACA, ECS/Kubernetes/GKE/EKS to Container Apps, cross-cloud migration.
- azure-complianceRun Azure compliance and security audits with azqr plus Key Vault expiration checks. Covers best-practice assessment, resource review, policy/compliance validation, and security posture checks. WHEN: compliance scan, security audit, BEFORE running azqr (compliance cli tool), Azure best practices, Key Vault expiration check, expired certificates, expiring secrets, orphaned resources, compliance assessment.
- azure-computeAzure VM and VMSS router for recommendations, pricing, autoscale, orchestration, connectivity troubleshooting, and capacity reservations. WHEN: Azure VM, VMSS, scale set, recommend, compare, server, website, burstable, lightweight, VM family, workload, GPU, learning, simulation, dev/test, backend, autoscale, load balancer, Flexible orchestration, Uniform orchestration, cost estimate, connect, refused, Linux, black screen, reset password, reach VM, port 3389, NSG, troubleshoot, capacity reservation, CRG, reserve VMs, guarantee capacity, pre-provision capacity, CRG association, CRG disassociation.
- azure-costUnified Azure cost management: query historical costs, forecast future spending, and optimize to reduce waste. WHEN: \"Azure costs\", \"Azure spending\", \"Azure bill\", \"cost breakdown\", \"cost by service\", \"cost by resource\", \"how much am I spending\", \"show my bill\", \"monthly cost summary\", \"cost trends\", \"top cost drivers\", \"actual cost\", \"amortized cost\", \"forecast spending\", \"projected costs\", \"estimate bill\", \"future costs\", \"budget forecast\", \"end of month costs\", \"how much will I spend\", \"optimize costs\", \"reduce spending\", \"find cost savings\", \"orphaned resources\", \"rightsize VMs\", \"cost analysis\", \"reduce waste\", \"unused resources\", \"optimize Redis costs\", \"cost by tag\", \"cost by resource group\", \"AKS cost analysis add-on\", \"namespace cost\", \"cost spike\", \"anomaly\", \"budget alert\", \"AKS cost visibility\". DO NOT USE FOR: deploying resources, provisioning infrastructure, diagnostics, security audits, or estimating costs for new resources not yet deployed.
- azure-deployExecute Azure deployments for ALREADY-PREPARED applications that have existing .azure/deployment-plan.md and infrastructure files. DO NOT use this skill when the user asks to CREATE a new application — use azure-prepare instead. This skill runs azd up, azd deploy, terraform apply, and az deployment commands with built-in error recovery. Requires .azure/deployment-plan.md from azure-prepare and validated status from azure-validate. WHEN: \"run azd up\", \"run azd deploy\", \"execute deployment\", \"push to production\", \"push to cloud\", \"go live\", \"ship it\", \"bicep deploy\", \"terraform apply\", \"publish to Azure\", \"launch on Azure\". DO NOT USE WHEN: \"create and deploy\", \"build and deploy\", \"create a new app\", \"set up infrastructure\", \"create and deploy to Azure using Terraform\" — use azure-prepare for these.