k8s-debug-pods

$npx mdskill add kurtosis-tech/kurtosis/k8s-debug-pods

Diagnose Kurtosis pod failures by inspecting Kubernetes events and node conditions.

  • Fixes Pending, CrashLoopBackOff, ImagePullBackOff, and Evicted pod states.
  • Requires kubectl access to query pods, nodes, and cluster events.
  • Analyzes taints, tolerations, resource pressure, and image tags.
  • Outputs actionable bash commands and specific remediation steps.

SKILL.md

.github/skills/k8s-debug-podsView on GitHub ↗
---
name: k8s-debug-pods
description: Debug Kurtosis pods on Kubernetes. Diagnose why pods are Pending, CrashLoopBackOff, ImagePullBackOff, or Evicted. Check node taints, tolerations, resource pressure, and pod events. Use when kurtosis engine start fails or pods aren't coming online.
compatibility: Requires kubectl with cluster access.
metadata:
  author: ethpandaops
  version: "1.0"
---

# K8s Debug Pods

Diagnose and fix issues with Kurtosis pods on Kubernetes.

## Quick triage

```bash
# See all kurtosis-related pods across namespaces
kubectl get pods -A | grep kurtosis

# Check for problem pods (not Running)
kubectl get pods -A | grep kurtosis | grep -v Running

# Get events for a specific pod
kubectl describe pod <POD_NAME> -n <NAMESPACE> | tail -30
```

## Common pod states and fixes

### Pending — Unschedulable

The pod can't be scheduled because of node taints, resource pressure, or affinity rules.

```bash
# Check node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

# Check node conditions (DiskPressure, MemoryPressure, etc.)
kubectl get nodes -o custom-columns=NAME:.metadata.name,CONDITIONS:.status.conditions[*].type
```

**Fix**: Add tolerations to the kurtosis config at `~/Library/Application Support/kurtosis/kurtosis-config.yml` or fix the node condition.

### ImagePullBackOff

The image tag doesn't exist on the registry.

```bash
# Check which image is failing
kubectl describe pod <POD_NAME> -n <NAMESPACE> | grep -A5 "Image:"

# Verify image exists on Docker Hub
docker manifest inspect <IMAGE>:<TAG>
```

**Fix**: Push the correct image tag, or fix the image reference in the code.

### CrashLoopBackOff

The container starts but crashes immediately.

```bash
# Check container logs
kubectl logs <POD_NAME> -n <NAMESPACE>
kubectl logs <POD_NAME> -n <NAMESPACE> --previous
```

### Evicted

The node evicted the pod due to resource pressure.

```bash
# Check which nodes have pressure
kubectl get nodes -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions[-1].type

# Clean up evicted pods
kubectl get pods -A | grep Evicted | awk '{print $2 " -n " $1}' | xargs -L1 kubectl delete pod
```

## Kurtosis-specific pod types

| Pod pattern | Component | Image source |
|-------------|-----------|-------------|
| `kurtosis-engine-*` | Engine server | `engine/server/Dockerfile` |
| `kurtosis-api` (in `kt-*` namespaces) | API Container (APIC) | `core/server/Dockerfile` |
| `kurtosis-logs-collector-*` | Fluentbit DaemonSet | Pulled from registry |
| `kurtosis-logs-aggregator-*` | Vector deployment | Pulled from registry |
| `remove-dir-pod-*` | Fluentbit cleanup pods | busybox |
| `files-artifact-expander` (init container) | Files artifacts | `core/files_artifacts_expander/Dockerfile` |

## Engine start failures

If `kurtosis engine start` fails:

1. Check if old kurtosis namespaces exist: `kubectl get ns | grep kurtosis`
2. Delete them: `kubectl get ns | grep kurtosis | awk '{print $1}' | xargs -r kubectl delete ns`
3. Retry engine start

## Logs collector issues

The logs collector is a DaemonSet that runs on every node. If some nodes are unhealthy:

```bash
# Check DaemonSet status
kubectl get ds -A | grep kurtosis

# See which pods are not running
kubectl get pods -A | grep logs-collector | grep -v Running
```

Nodes with DiskPressure or other taints may not schedule collector pods — this is expected and the engine should start with a warning about partially degraded collection.

More from kurtosis-tech/kurtosis

SkillDescription
cli-local-buildBuild and test the Kurtosis CLI from source. Compile the CLI binary locally, run it against Docker or Kubernetes engines, and iterate on CLI changes without creating a release. Use when developing or debugging CLI commands.
cluster-manageManage Kurtosis cluster settings. Switch between Docker and Kubernetes backends, list available clusters, and configure which cluster Kurtosis uses. Use when you need to change where Kurtosis runs enclaves.
context-manageManage Kurtosis contexts for connecting to different Kurtosis instances. Add, list, switch, and remove contexts. Use when working with multiple Kurtosis environments (local, remote, team shared).
docker-debugDebug Kurtosis running on local Docker. Inspect engine, API container, and service logs. Diagnose container crashes, port conflicts, and networking issues. Use when kurtosis commands fail or services aren't reachable on Docker.
docker-local-buildBuild and test Kurtosis from source on local Docker. Compiles all components (engine, core, files-artifacts-expander), builds Docker images, installs the CLI, and restarts the engine. Use when developing Kurtosis and testing changes locally with Docker.
dumpDump Kurtosis state for debugging and sharing. Export enclave state including service logs, configurations, and file artifacts to a local directory. Use when you need to capture state for offline analysis or to share with others for debugging.
enclave-inspectInspect and manage Kurtosis enclaves. List enclaves, view services and ports, examine file artifacts, dump enclave state for debugging, and clean up. Use when you need to understand what's running inside an enclave or export its state.
engine-manageManage the Kurtosis engine server. Start, stop, restart the engine, check status, and view engine logs. Covers both Docker and Kubernetes engine backends. Use when the engine won't start, needs restarting, or you need to check engine health.
files-inspectInspect, download, upload, and debug Kurtosis file artifacts. View artifacts in an enclave, download them locally for inspection, upload local files, and troubleshoot file mounting issues. Use when services can't find expected files or configs are wrong.
gatewayStart and manage the Kurtosis gateway for Kubernetes. The gateway forwards local ports to the Kurtosis engine and services running in a k8s cluster. Required when using Kurtosis with Kubernetes. Use when kurtosis engine status shows nothing on k8s or services aren't reachable.