detecting-insider-data-exfiltration-via-dlp

Name: detecting-insider-data-exfiltration-via-dlp
Author: mukul975/Anthropic-Cybersecurity-Skills

$npx mdskill add mukul975/Anthropic-Cybersecurity-Skills/detecting-insider-data-exfiltration-via-dlp

Detect anomalous data exfiltration using behavioral baselines.

Identifies insider threats through endpoint and cloud log analysis.
Integrates with pandas for statistical anomaly detection.
Executes behavioral analytics on file access and upload patterns.
Generates structured alerts for SOC analysts and threat hunters.

SKILL.md

.github/skills/detecting-insider-data-exfiltration-via-dlpView on GitHub ↗

---
name: detecting-insider-data-exfiltration-via-dlp
description: >
  Detects insider data exfiltration by analyzing DLP policy violations, file access
  patterns, upload volume anomalies, and off-hours activity in endpoint and cloud logs.
  Uses pandas for behavioral analytics and statistical baselines. Use when investigating
  insider threats or building user behavior analytics for data loss prevention.
domain: cybersecurity
subdomain: security-operations
tags: [detecting, insider, data, exfiltration]
version: "1.0"
author: mahipal
license: Apache-2.0
---

# Detecting Insider Data Exfiltration via DLP


## When to Use

- When investigating security incidents that require detecting insider data exfiltration via dlp
- When building detection rules or threat hunting queries for this domain
- When SOC analysts need structured procedures for this analysis type
- When validating security monitoring coverage for related attack techniques

## Prerequisites

- Familiarity with security operations concepts and tools
- Access to a test or lab environment for safe execution
- Python 3.8+ with required dependencies installed
- Appropriate authorization for any testing activities

## Instructions

Analyze endpoint activity logs, cloud storage access, and email DLP events to detect
data exfiltration patterns using behavioral baselines and statistical anomaly detection.

```python
import pandas as pd

df = pd.read_csv("file_activity.csv", parse_dates=["timestamp"])
# Baseline: average daily upload volume per user
baseline = df.groupby(["user", df["timestamp"].dt.date])["bytes_transferred"].sum()
user_avg = baseline.groupby("user").mean()

# Alert on users exceeding 3x their baseline
today = df[df["timestamp"].dt.date == pd.Timestamp.today().date()]
today_totals = today.groupby("user")["bytes_transferred"].sum()
anomalies = today_totals[today_totals > user_avg * 3]
```

Key indicators:
1. Upload volume exceeding 3x daily baseline
2. Access to files outside normal scope
3. Bulk downloads before resignation
4. Off-hours file access patterns
5. USB/external device usage spikes

## Examples

```python
# Detect off-hours activity
df["hour"] = df["timestamp"].dt.hour
off_hours = df[(df["hour"] < 6) | (df["hour"] > 22)]
suspicious = off_hours.groupby("user").size().sort_values(ascending=False)
```

More from mukul975/Anthropic-Cybersecurity-Skills