gatus
$
npx mdskill add TerminalSkills/skills/gatusProvides expert guidance for setting up Gatus, a lightweight health check and status page tool.
- Simplifies endpoint monitoring with conditions and alerting.
- Uses Gatus, a self-hosted Go tool configured via a single YAML file.
- Leverages YAML configuration for defining endpoints, storage, and alerts.
- Delivers setup instructions and best practices for uptime tracking and status page creation.
SKILL.md
.github/skills/gatusView on GitHub ↗
---
name: gatus
description: Expert guidance for Gatus, the lightweight, self-hosted health check and status page tool written in Go. Helps developers set up endpoint monitoring with conditions, alerting, and a beautiful status page — all configured via a single YAML file with no database required.
license: Apache-2.0
compatibility: No special requirements
metadata:
author: terminal-skills
version: 1.0.0
category: devops
tags:
- uptime
- health-check
- status-page
- self-hosted
- lightweight
---
# Gatus — Lightweight Health Check Dashboard
## Overview
Gatus, the lightweight, self-hosted health check and status page tool written in Go. Helps developers set up endpoint monitoring with conditions, alerting, and a beautiful status page — all configured via a single YAML file with no database required.
## Instructions
### Configuration
```yaml
# config.yaml — Complete Gatus configuration
# Single YAML file defines all endpoints, conditions, and alerts.
# Storage (optional — for persistence across restarts)
storage:
type: sqlite
path: /data/gatus.db
# Web UI settings
web:
port: 8080
# Alert providers
alerting:
slack:
webhook-url: "${SLACK_WEBHOOK_URL}"
default-alert:
enabled: true
failure-threshold: 3 # Alert after 3 consecutive failures
success-threshold: 2 # Recover after 2 consecutive successes
send-on-resolved: true
pagerduty:
integration-key: "${PAGERDUTY_KEY}"
default-alert:
enabled: true
failure-threshold: 5
send-on-resolved: true
email:
from: "gatus@example.com"
host: "smtp.example.com"
port: 587
username: "${SMTP_USER}"
password: "${SMTP_PASS}"
default-alert:
enabled: false # Only enable on critical endpoints
# Endpoints to monitor
endpoints:
# --- API Health ---
- name: API Gateway
group: backend
url: "https://api.example.com/health"
interval: 30s
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 2000" # Under 2 seconds
- "[BODY].status == healthy" # JSON body check
alerts:
- type: slack
- type: pagerduty
- name: Auth Service
group: backend
url: "https://api.example.com/auth/health"
interval: 30s
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 1000"
# --- Frontend ---
- name: Website
group: frontend
url: "https://example.com"
interval: 60s
conditions:
- "[STATUS] == 200"
- "[RESPONSE_TIME] < 3000"
- "[BODY] contains Welcome" # Verify page renders
alerts:
- type: slack
# --- Database ---
- name: PostgreSQL
group: infrastructure
url: "tcp://db.example.com:5432"
interval: 30s
conditions:
- "[CONNECTED] == true"
alerts:
- type: slack
- type: pagerduty
# --- Redis ---
- name: Redis
group: infrastructure
url: "tcp://redis.example.com:6379"
interval: 15s
conditions:
- "[CONNECTED] == true"
# --- DNS ---
- name: DNS Resolution
group: infrastructure
url: "dns://8.8.8.8"
dns:
query-name: "example.com"
query-type: "A"
conditions:
- "[DNS_RCODE] == NOERROR"
- "[RESPONSE_TIME] < 500"
# --- SSL Certificate ---
- name: SSL Certificate
group: security
url: "https://example.com"
interval: 1h
conditions:
- "[CERTIFICATE_EXPIRATION] > 720h" # Alert if < 30 days
# --- External Dependencies ---
- name: Stripe API
group: external
url: "https://api.stripe.com/v1"
interval: 5m
conditions:
- "[STATUS] == 401" # Unauthenticated is expected (API is up)
- "[RESPONSE_TIME] < 3000"
# --- GraphQL ---
- name: GraphQL API
group: backend
url: "https://api.example.com/graphql"
method: POST
headers:
Content-Type: application/json
body: '{"query": "{ __typename }"}'
interval: 30s
conditions:
- "[STATUS] == 200"
- "[BODY].data.__typename == Query"
```
### Deployment
```yaml
# docker-compose.yml — Self-hosted Gatus
version: "3.8"
services:
gatus:
image: twinproduction/gatus:latest
ports:
- "8080:8080"
volumes:
- ./config.yaml:/config/config.yaml
- gatus-data:/data
environment:
- SLACK_WEBHOOK_URL=${SLACK_WEBHOOK_URL}
- PAGERDUTY_KEY=${PAGERDUTY_KEY}
restart: unless-stopped
volumes:
gatus-data:
```
```yaml
# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: gatus
spec:
replicas: 1
selector:
matchLabels:
app: gatus
template:
metadata:
labels:
app: gatus
spec:
containers:
- name: gatus
image: twinproduction/gatus:latest
ports:
- containerPort: 8080
volumeMounts:
- name: config
mountPath: /config
envFrom:
- secretRef:
name: gatus-secrets
volumes:
- name: config
configMap:
name: gatus-config
```
## Installation
```bash
# Docker
docker run -p 8080:8080 -v $(pwd)/config.yaml:/config/config.yaml twinproduction/gatus
# Binary
go install github.com/TwiN/gatus/v5@latest
# Helm
helm repo add gatus https://twin.github.io/helm-charts
helm install gatus gatus/gatus
```
## Examples
### Example 1: Setting up Gatus for a microservices project
**User request:**
```
I have a Node.js API and a React frontend running in Docker. Set up Gatus for monitoring/deployment.
```
The agent creates the necessary configuration files based on patterns like `# config.yaml — Complete Gatus configuration`, sets up the integration with the existing Docker setup, configures appropriate defaults for a Node.js + React stack, and provides verification commands to confirm everything is working.
### Example 2: Troubleshooting deployment issues
**User request:**
```
Gatus is showing errors in our deployment. Here are the logs: [error output]
```
The agent analyzes the error output, identifies the root cause by cross-referencing with common Gatus issues, applies the fix (updating configuration, adjusting resource limits, or correcting syntax), and verifies the resolution with appropriate health checks.
## Guidelines
1. **Single YAML** — Keep all monitoring in one config file; version control it alongside your infrastructure
2. **Group endpoints** — Use groups (backend, frontend, infrastructure) for organized status page display
3. **Multiple conditions** — Check status code AND response time AND body content; a 200 with wrong content is still broken
4. **Failure thresholds** — Set failure-threshold to 2-3 to avoid false alarms from transient network blips
5. **Check dependencies** — Monitor external services (Stripe, AWS) separately; know when the issue is upstream vs yours
6. **SSL monitoring** — Check certificate expiration weekly; alert at 30 days to give time for renewal
7. **Lightweight deployment** — Gatus uses ~15MB RAM; run it on any machine, even a Raspberry Pi
8. **SQLite for history** — Enable SQLite storage for uptime history across restarts; no external database needed