gcp-cloud-monitoring

$npx mdskill add automateyournetwork/netclaw/gcp-cloud-monitoring

Monitors GCP resources using time series metrics and alert policies

  • Checks network performance, VM metrics, and load balancer latency
  • Uses Google Cloud Monitoring API and MCP server for data retrieval
  • Queries time series data, alert policies, and active violations
  • Returns structured metrics, alert details, and resource insights

SKILL.md

.github/skills/gcp-cloud-monitoringView on GitHub ↗
---
name: gcp-cloud-monitoring
description: "Google Cloud Monitoring — time series metrics, alert policies, active alerts, metric discovery. Use when checking GCP network performance, investigating firing alerts, querying VM CPU or memory metrics, reviewing Cloud VPN tunnel status, or assessing load balancer latency."
version: 1.0.0
license: Apache-2.0
tags: [gcp, cloud-monitoring, metrics, alerts, observability]
---

# GCP Cloud Monitoring

## MCP Server

- **Endpoint**: `https://monitoring.googleapis.com/mcp` (Streamable HTTP)
- **Auth**: OAuth 2.0 via Google IAM — service account key (`GOOGLE_APPLICATION_CREDENTIALS`) or `gcloud auth application-default login`
- **Requires**: `GCP_PROJECT_ID` environment variable

## Available Tools (6)

| Tool | What It Does |
|------|-------------|
| `list_timeseries` | Query time series data — CPU, memory, network, disk metrics for any GCP resource |
| `list_metric_descriptors` | Discover available metric types in a project — find what you can monitor |
| `list_alert_policies` | List all alerting policies — conditions, notification channels, thresholds |
| `get_alert_policy` | Get details of a specific alerting policy |
| `list_alerts` | List current and past alert violations — what's firing right now |
| `get_alert` | Get details of a specific alert violation |

## Workflow: GCP Network Monitoring

When a user asks "how is our GCP network performing?":

1. **Check alerts**: `list_alerts` to find any active alert violations
2. **VM network metrics**: `list_timeseries` for `compute.googleapis.com/instance/network/received_bytes_count` and `sent_bytes_count`
3. **Packet drops**: `list_timeseries` for `compute.googleapis.com/instance/network/received_packets_dropped_count`
4. **Firewall metrics**: `list_timeseries` for `compute.googleapis.com/firewall/dropped_packets_count`
5. **Load balancer metrics**: `list_timeseries` for `loadbalancing.googleapis.com/https/request_count` and `total_latencies`
6. **Report**: Network health dashboard with any issues flagged

## Workflow: Alert Investigation

When investigating GCP alerts:

1. **List active alerts**: `list_alerts` — find what's currently firing
2. **Get alert details**: `get_alert` — condition, threshold, resource affected
3. **Get policy**: `get_alert_policy` — what triggers this alert, notification channels
4. **Pull metrics**: `list_timeseries` for the affected metric — see the spike/anomaly
5. **Cross-reference**: Use `gcp-cloud-logging` for correlated log entries
6. **Report**: Alert investigation with root cause and timeline

## Workflow: Resource Health Check

When checking GCP infrastructure health:

1. **Discover metrics**: `list_metric_descriptors` filtered by service (compute, networking, loadbalancing)
2. **VM CPU/Memory**: `list_timeseries` for `compute.googleapis.com/instance/cpu/utilization` and memory metrics
3. **Disk I/O**: `list_timeseries` for `compute.googleapis.com/instance/disk/read_bytes_count` and write metrics
4. **Network throughput**: `list_timeseries` for network sent/received bytes
5. **Alert status**: `list_alert_policies` + `list_alerts` — any policies in violation?
6. **Report**: Infrastructure health dashboard with severity ratings

## Common GCP Network Metrics

| Metric | What It Tells You |
|--------|-------------------|
| `compute.googleapis.com/instance/network/received_bytes_count` | Inbound network throughput per VM |
| `compute.googleapis.com/instance/network/sent_bytes_count` | Outbound network throughput per VM |
| `compute.googleapis.com/instance/network/received_packets_dropped_count` | Dropped inbound packets (congestion) |
| `compute.googleapis.com/instance/network/sent_packets_dropped_count` | Dropped outbound packets (congestion) |
| `compute.googleapis.com/firewall/dropped_packets_count` | Packets dropped by VPC firewall rules |
| `loadbalancing.googleapis.com/https/request_count` | HTTP(S) LB request rate |
| `loadbalancing.googleapis.com/https/total_latencies` | HTTP(S) LB end-to-end latency |
| `loadbalancing.googleapis.com/https/backend_latencies` | Backend response time behind LB |
| `vpn.googleapis.com/tunnel_established` | Cloud VPN tunnel state (1=up, 0=down) |
| `vpn.googleapis.com/sent_bytes_count` | Bytes sent through VPN tunnel |
| `router.googleapis.com/bgp/received_routes_count` | BGP routes received by Cloud Router |
| `interconnect.googleapis.com/link/received_bytes_count` | Cloud Interconnect link throughput |

## Important Rules

- **Remote MCP server** — hosted by Google, no local install needed
- **OAuth 2.0 authentication** — uses IAM for access control
- **Project-scoped** — metrics are scoped to the configured GCP project
- **Read-only** — monitoring queries don't modify anything
- **Record in GAIT** — log monitoring investigations for audit trail

## Environment Variables

- `GCP_PROJECT_ID` — Google Cloud project ID
- `GOOGLE_APPLICATION_CREDENTIALS` — Path to service account key JSON file

More from automateyournetwork/netclaw

SkillDescription
aap-automationRed Hat Ansible Automation Platform — inventory management, job template execution, project SCM sync, ad-hoc commands, host management, Galaxy content discovery. Use when automating infrastructure with Ansible, running playbooks, managing inventories, or searching for Ansible collections and roles.
aap-edaEvent-Driven Ansible (EDA) — activation lifecycle, rulebook management, decision environments, event stream monitoring. Use when managing event-driven automation triggers, enabling/disabling activations, or reviewing EDA rulebooks.
aap-lintansible-lint playbook and role validation — syntax checking, best practice enforcement, project-wide analysis, rule filtering. Use when validating Ansible playbooks, checking code quality, or enforcing automation best practices before deployment.
aci-change-deploySafe ACI policy change deployment - ServiceNow CR lifecycle, pre/post-change fault baselines, APIC policy application, automatic rollback on fault delta, and GAIT audit trail. Use when deploying ACI policy changes, creating tenants or EPGs, pushing config to APIC, or running a change window with rollback protection.
aci-fabric-auditComprehensive Cisco ACI fabric health audit - node status, tenant/VRF/BD/EPG policy review, contract analysis, fault triage, and endpoint learning verification. Use when auditing ACI fabric health, checking for faults, reviewing tenant policies, or running pre/post-change baselines on APIC.
arista-cvpArista CloudVision Portal (CVP) automation via REST API — device inventory, events, connectivity monitoring, tag management (4 tools). Use when managing Arista devices, checking CloudVision events, monitoring network connectivity probes, or tagging devices in CVP.
aruba-cx-configView and manage Aruba CX switch configurations, perform ISSU upgrades, and firmware operations
aruba-cx-interfacesMonitor Aruba CX switch interface status, LLDP neighbors, and optical transceiver health
aruba-cx-switchingView and manage Aruba CX switch VLANs and MAC address tables for Layer 2 operations
aruba-cx-systemDiscover Aruba CX switch system information, firmware versions, and VSF topology