community-signals
$
npx mdskill add gooseworks-ai/goose-skills/community-signalsScrape developer forums to find high-intent migration signals.
- Identifies users seeking alternatives or complaining about competitors.
- Requires Apify token for Reddit and free Algolia for Hacker News.
- Scores lead strength based on intent keywords and cross-platform activity.
- Outputs ranked prospect lists with intent metrics and source URLs.
SKILL.md
.github/skills/community-signalsView on GitHub ↗
---
name: community-signals
description: Extract leads from developer forums (Hacker News, Reddit) by detecting intent signals — alternative seeking, competitor pain, scaling challenges, DIY solutions, and migration intent. Scores users by intent strength and cross-platform presence.
user-invocable: true
allowed-tools: Bash, Read, Write, Edit, Grep, Glob, WebSearch
argument-hint: [queries-json-path]
---
# Community Signals
Extract high-intent leads from developer community forums by detecting buying signals in public discussions. Currently supports Hacker News and Reddit.
## When to Use
- User wants to find leads from developer communities or forums
- User wants to identify people publicly expressing pain with competitors
- User wants to find people asking "what tool should I use for X"
- User mentions Hacker News, Reddit, Stack Overflow, or developer forums as lead sources
- User describes prospects who discuss tools, complain about solutions, or ask for recommendations in public forums
- User wants to find developers who built DIY/hacky solutions for problems the user's product solves
## Prerequisites
- Python 3.9+ with `requests` and optionally `python-dotenv`
- Apify API token in `.env` (for Reddit scraping)
- No auth needed for Hacker News (free Algolia API)
- Working directory: the project root containing this skill
## Phase 1: Collect Context
### Step 1: Gather Product & ICP Information
Ask the user for the following. Do NOT proceed without this — the entire query generation depends on it.
> "To find the right leads from developer communities, I need to understand:
> 1. **What does your product do?** (one-liner)
> 2. **Who are your competitors?** (list the main ones)
> 3. **What specific problems does your product solve?** (the pain points)
> 4. **Who is your ideal buyer?** (role, company type, tech stack)
> 5. **Any specific technologies or keywords** associated with your space?"
If the user has already provided this context (e.g., from running the github-repo-signals skill), use that — don't ask again.
## Phase 2: Generate Search Queries
### Step 2: Generate Queries Across 9 Categories
Based on the user's product info, generate 3-5 search queries per category. These are the fixed categories — do not skip any:
**Category 1: Alternative Seeking** (intent score: 9)
People actively looking to switch tools.
- Pattern: "[competitor] alternative", "alternative to [competitor]", "looking for [product type]"
- Example: "twilio alternative", "alternative to agora", "looking for video SDK"
**Category 2: Competitor Pain** (intent score: 8)
People frustrated with a specific competitor.
- Pattern: "[competitor] issues", "frustrated with [competitor]", "[competitor] doesn't support"
- Example: "twilio video quality issues", "frustrated with agora pricing", "vonage api unreliable"
**Category 3: Problem Space Questions** (intent score: 6)
People trying to solve the exact problem the product addresses.
- Pattern: "how to [thing product does]", "best way to [problem]", "recommendations for [category]"
- Example: "how to add video calling to app", "best webrtc framework", "real-time communication SDK"
**Category 4: Tool Comparison** (intent score: 8)
People actively comparing options — in buying mode.
- Pattern: "[competitor A] vs [competitor B]", "comparing [tools]", "which [product type] should I use"
- Example: "twilio vs agora", "comparing video APIs", "which webrtc platform"
**Category 5: DIY / Built Own Solution** (intent score: 9)
People who built a custom solution — validated the need, would pay for a proper product.
- Pattern: "I built my own [thing]", "Show HN: [thing product replaces]", "custom [solution type]"
- Example: "I built my own video conferencing", "Show HN: open source video call", "custom webrtc server"
**Category 6: Scaling Challenges** (intent score: 7)
People hitting limits that the product solves.
- Pattern: "[problem] at scale", "scaling [thing]", "[thing] breaks with many users"
- Example: "webrtc scaling issues", "video calls lagging with 50+ participants", "scaling real-time communication"
**Category 7: Migration Intent** (intent score: 9)
People who have already decided to leave — looking for where to go.
- Pattern: "migrating from [competitor]", "moving away from [competitor]", "switching from [competitor]"
- Example: "migrating from twilio video", "moving away from agora", "switching video API providers"
**Category 8: Budget / Pricing Pain** (intent score: 7)
Cost is the trigger — open to cheaper or better-value alternatives.
- Pattern: "[competitor] too expensive", "[competitor] pricing", "cheaper alternative to [competitor]"
- Example: "twilio too expensive", "agora pricing 2026", "cheaper video API"
**Category 9: Feature Gap Complaints** (intent score: 7)
Needs something their current tool doesn't do — and the user's product does.
- Pattern: "does [competitor] support [feature]", "[competitor] missing [feature]", "wish [competitor] had"
- Example: "does twilio support recording", "agora missing breakout rooms", "wish vonage had better docs"
### Step 3: Discover Relevant Subreddits
Do a web search to find subreddits where the user's ICP is active. Search for:
- "[product category] subreddit"
- "[technology] subreddit"
- "[competitor name] subreddit"
Common developer subreddits to consider (pick the relevant ones):
- r/programming, r/webdev, r/devops, r/selfhosted
- r/kubernetes, r/aws, r/googlecloud, r/azure
- r/node, r/python, r/golang, r/rust
- r/startups, r/SaaS, r/entrepreneur
- r/sysadmin, r/networking
- Technology-specific: r/VOIP, r/machinelearning, r/dataengineering, etc.
Select 5-10 subreddits most relevant to the user's space.
### Step 4: Present Queries for Review
Present ALL generated queries to the user in a structured table:
```
Category | Queries
--------------------------|------------------------------------------
Alternative Seeking | "twilio alternative", "agora alternative", ...
Competitor Pain | "twilio issues", "frustrated with agora", ...
... | ...
Subreddits to scan: r/webdev, r/VOIP, r/programming, ...
```
Ask:
> "Here are the search queries I've generated. Would you like to:
> 1. Run with these as-is
> 2. Add or remove specific queries
> 3. Add or remove subreddits
>
> Estimated cost: HN is free. Reddit via Apify will cost approximately $[estimate based on query count x ~$0.05 per query]."
Wait for user approval before proceeding.
### Step 5: Save Queries File
Once approved, save the queries as a JSON file:
```bash
cat > ${CLAUDE_SKILL_DIR}/../.tmp/community_queries.json << 'QUERIESEOF'
{
"product": "Product Name",
"queries": [
{"category": "alternative_seeking", "query": "twilio alternative"},
{"category": "alternative_seeking", "query": "agora alternative"},
{"category": "competitor_pain", "query": "twilio video quality issues"}
],
"subreddits": ["r/webdev", "r/VOIP", "r/programming"]
}
QUERIESEOF
```
## Phase 3: Execute Scan
### Step 6: Verify Environment
```bash
python3 -c "import requests; print('OK')"
```
### Step 7: Run the Tool
```bash
python3 ${CLAUDE_SKILL_DIR}/scripts/community_signals.py \
--queries ${CLAUDE_SKILL_DIR}/../.tmp/community_queries.json \
--days 30 \
--max-reddit-posts 50 \
--max-reddit-comments 20 \
--output ${CLAUDE_SKILL_DIR}/../.tmp/community_signals.csv
```
The tool will:
1. Search Hacker News (stories + comments) for all queries — free
2. Search Reddit via Apify for all queries + scan subreddits — pay per result
3. Filter to last 30 days
4. Deduplicate users across platforms
5. Score by intent strength, signal count, category diversity, and cross-platform presence
6. Fetch HN user profiles (karma, bio) — free
7. Export two CSV files: `_users.csv` and `_signals.csv`
**Optional flags:**
- `--skip-reddit` — only search HN (free, for testing)
- `--skip-hn` — only search Reddit
- `--days 7` — narrower time window for very fresh signals
## Phase 4: Analyze & Recommend
### Step 9: Analyze the Results
Read the output CSV files and present a structured briefing:
**9a. Overall Stats**
- Total signals found (HN + Reddit)
- Unique users
- Split by platform (HN vs Reddit)
- Cross-platform matches (same username on both)
**9b. Signal Category Breakdown**
- How many signals per category
- Which categories produced the most results
- Which categories had the highest-engagement posts (upvotes, comments)
**9c. Top Subreddits Discovered**
- Which subreddits appeared most frequently
- This tells the user where their prospects hang out — valuable for community marketing, not just outreach
**9d. Highest-Intent Users**
- List top 15-20 users by composite score
- For each: username, platform, categories they appeared in, sample post/comment, engagement
- Flag cross-platform users prominently
**9e. Common Themes**
- What are people specifically asking for or complaining about?
- Any patterns in the pain points that the user's product addresses?
- Any surprising findings (e.g., a competitor getting mentioned negatively much more than others)?
### Step 10: Recommend Next Steps
Based on findings + user's product context:
1. **If strong signals found (>50 high-intent users):**
- Recommend enriching top users via SixtyFour
- For HN users: use their HN bio/karma + username for enrichment context
- For Reddit users: username is the only identifier — enrichment hit rates may be lower
- Suggest starting with HN users (more likely to have real names in bio)
2. **If cross-platform matches found:**
- These are highest priority — someone active on both HN and Reddit in your space is deeply engaged
- Recommend enriching these first
3. **If specific subreddits emerged as hotspots:**
- Recommend ongoing monitoring of those subreddits
- Suggest the user consider community engagement (commenting, answering questions) in those subreddits
4. **If "alternative seeking" or "migration intent" signals dominate:**
- These are the most time-sensitive leads — they're actively evaluating RIGHT NOW
- Recommend immediate outreach
5. **If "DIY / built own" signals found:**
- These are the highest-quality leads — they've validated the need
- Recommend personalized outreach referencing their project
6. **Always include:**
- Cost estimate for enrichment
- Suggested outreach angle per signal category
- Reminder that community forum users respond better to helpful engagement than cold outreach
### Step 11: Ask for Go-Ahead
> "Would you like me to:
> 1. Enrich the top [N] users via SixtyFour (estimated cost: $X)
> 2. Run a deeper scan on the hotspot subreddits
> 3. Export this data for manual review first
> 4. Combine these results with GitHub signals data (if available)"
Wait for user confirmation.
## Output Schema
**`community_signals_users.csv`** — One row per unique user across all platforms
| Column | Description |
|--------|-------------|
| username | Forum username |
| platform | hackernews or reddit |
| composite_score | Overall lead score (intent + diversity + cross-platform) |
| intent_score | Sum of category-weighted intent scores |
| signal_count | Number of matching posts/comments |
| categories | Which signal categories they appeared in |
| platforms_active | Which platforms they were found on |
| subreddits | Reddit subreddits they posted in |
| hn_karma | HN karma score (HN users only) |
| hn_bio | HN profile bio (HN users only) |
| total_engagement | Sum of upvotes + comments across their signals |
| first_seen | Earliest matching post/comment |
| latest_seen | Most recent matching post/comment |
| sample_url | Link to one of their matching posts |
**`community_signals_signals.csv`** — One row per matching post/comment
| Column | Description |
|--------|-------------|
| platform | hackernews or reddit |
| author | Username |
| category | Signal category code |
| category_label | Human-readable category name |
| content_type | story, comment, or post |
| title | Post/story title |
| text | Post/comment body (truncated) |
| subreddit | Reddit subreddit (if applicable) |
| score | Upvotes |
| num_comments | Comment count |
| created_at | Date posted |
| query_matched | Which search query found this |
| url | Permalink to the post/comment |
## Scoring System
**Intent scores by category:**
| Category | Score per Signal |
|----------|-----------------|
| Alternative Seeking | 9 |
| DIY / Built Own | 9 |
| Migration Intent | 9 |
| Competitor Pain | 8 |
| Tool Comparison | 8 |
| Scaling Challenge | 7 |
| Budget / Pricing | 7 |
| Feature Gap | 7 |
| Problem Space | 6 |
**Composite score bonuses:**
- +2 per unique category the user appeared in (diversity)
- +10 if user found on multiple platforms (cross-platform)
- +2 per signal (capped at +10)
## Cost Estimates
| Platform | Cost | Notes |
|----------|------|-------|
| Hacker News | **Free** | Algolia API, 10k req/hr |
| Reddit (Apify) | ~$0.004/result + $0.04/run | Pay per result |
| **Typical run** (45 queries) | **~$5-10 total** | HN free + Reddit ~$5-10 |
## Limitations
- **Reddit comments:** Can't search comments directly — finds posts first, then fetches comments on those posts. Some comment-only discussions may be missed.
- **Reddit date filter:** No native date range parameter in Apify actor. Filtering happens in post-processing using `created_at` timestamps.
- **User identity:** Forum usernames are pseudonymous. Enrichment hit rates will be lower than GitHub (where people often use real names). HN users are more identifiable (many put real names in bio).
- **Rate limits:** HN Algolia: 10k req/hr. Apify: depends on plan.
More from gooseworks-ai/goose-skills