talk-analytics

$npx mdskill add BuilderIO/agent-native/talk-analytics

Calculate per-speaker talk metrics from diarized transcripts.

  • Reveal who dominated conversations using talk time and interruption counts.
  • Depends on diarized transcripts to compute accurate speaker statistics.
  • Derives interactivity and patience metrics from floor control patterns.
  • Populates the Participants panel with talk percentages and monologue data.
SKILL.md
.github/skills/talk-analyticsView on GitHub ↗
---
name: talk-analytics
description: >-
  Per-speaker stats derived from transcript segments — talk time, talk %,
  longest monologue, interruption count, question count, plus derived
  interactivity and patience metrics. Use when changing participant
  materialization, the talk-time math, or adding new derived metrics.
---

# Talk Analytics

Talk analytics are the "who dominated the call?" view — per-speaker stats computed from the diarized transcript. They drive the Participants panel, the interactivity chip, the "talk-to-listen ratio" insight, and filters like `list-calls --participantEmail=...`.

## When to use

Read this skill before:

- Changing how `call_participants` rows are populated
- Adding a new per-speaker metric
- Tuning the longest-monologue gap heuristic
- Debugging wrong talk percentages after relabeling a speaker

## Data model

`call_participants` — one row per unique speaker label in a call:

- `speaker_label` — `"Speaker 0"` / `"Alice"` / etc.
- `display_name` — user-set override (populated via the Participants UI).
- `email` — optional; used to cross-link with `workspace_members`.
- `is_internal` — flagged via the UI; affects how the summary prompt treats the speaker.
- `avatar_url`, `color` — display.
- `talk_ms` — total milliseconds this speaker was talking.
- `talk_pct` — integer 0–100, percentage of total talk time (across all speakers).
- `longest_monologue_ms` — longest consecutive run this speaker held the floor.
- `interruptions_count` — times this speaker started while another was still active.
- `questions_count` — segments attributed to this speaker ending in `?`.

## The compute path

`computeTalkStats(segments: TranscriptSegment[])` in `server/lib/calls.ts` is the **single source of truth**. Always call this — do not recompute inline.

Output shape:

```ts
{
  participants: Array<{
    speakerLabel: string;
    talkMs: number;
    talkPct: number;
    longestMonologueMs: number;
    interruptionsCount: number;
    questionsCount: number;
  }>;
  totalTalkMs: number;
}
```

### Talk time

For each segment, accumulate `endMs - startMs` under the segment's `speakerLabel`. Simple sum — no overlap correction needed since Deepgram segments are non-overlapping per-speaker within an utterance group.

### Talk percentage

`talkPct = round(100 * talkMs / totalTalkMs)` where `totalTalkMs` is the sum of all participants' `talkMs`. **Not the call duration** — silence and music don't count as talk time. A 30-minute call with 20 minutes of talk where Alice talks for 15 of those → 75%, not 50%.

Sum of percentages may drift by ±1% due to rounding; we accept that rather than allocate fractional percents to a "lost round" participant.

### Longest monologue

A monologue is a consecutive run of the same `speakerLabel` where each gap between adjacent segments is **≤ 1.5 seconds**. If gap > 1.5s, the monologue ends (even if the same speaker resumes after — they "yielded the floor").

```ts
const MAX_MONOLOGUE_GAP_MS = 1500;
for (let i = 0; i < segments.length; i++) {
  const seg = segments[i];
  const prev = segments[i - 1];
  if (
    prev &&
    prev.speakerLabel === seg.speakerLabel &&
    seg.startMs - prev.endMs <= MAX_MONOLOGUE_GAP_MS
  ) {
    currentRunMs += seg.endMs - seg.startMs + (seg.startMs - prev.endMs);
  } else {
    currentRunMs = seg.endMs - seg.startMs;
  }
  longest[seg.speakerLabel] = Math.max(longest[seg.speakerLabel] ?? 0, currentRunMs);
}
```

### Interruptions

An interruption is when speaker B starts a segment **before** speaker A's current segment has ended. We only count B's interruption — not A's segment.

```ts
for (let i = 1; i < segments.length; i++) {
  const cur = segments[i];
  const prev = segments[i - 1];
  if (cur.speakerLabel !== prev.speakerLabel && cur.startMs < prev.endMs) {
    interruptions[cur.speakerLabel] = (interruptions[cur.speakerLabel] ?? 0) + 1;
  }
}
```

Small overlaps (< 200ms) are filtered out — those are usually diarization noise or back-channel "mhm" acknowledgments.

### Questions

A question is a segment whose `text` ends with `?` after trimming. We don't try to detect rhetorical questions — the raw count is directly useful as a signal ("reps who ask fewer than 5 questions per call are telling, not selling"). Multi-sentence segments are counted once if any sentence ends in `?`.

## Derived metrics (UI only — not in the schema)

The UI computes two rollup chips from the raw stats. These are **not persisted** — they're render-time derivations.

### Interactivity

`Low` / `Medium` / `High`. Heuristic:

- High: questionsCount + responses > 20 AND average gap between speaker changes < 30s
- Medium: either condition partially met
- Low: long monologues from one speaker, few questions

A "response density" is how many speaker-change events happen per minute. A call with one speaker holding the floor for 10 minutes straight is Low regardless of question count.

### Patience (rep-side)

Measures how long the rep waits after asking a question before speaking again. Compute:

- For each segment attributed to the internal (rep) speaker ending in `?`:
  - Find the next segment (any speaker).
  - If it's the same speaker, gap = 0 (the rep kept talking).
  - If it's a different speaker, patience = that gap.
- Report the average across all such questions, rendered as "Waits {x}s after questions".

Low patience (< 500ms average) is a coaching signal — the rep talks over their own questions. High patience (> 1.5s) suggests they let prospects think.

## Materialization

`request-transcript` calls `materializeParticipants(db, callId, segments)` after Deepgram returns:

1. `computeTalkStats(segments)` → stats.
2. Load existing `call_participants` for the call.
3. For each speaker label in stats:
   - If a row exists: UPDATE stats fields.
   - Else: INSERT new row with defaults (`displayName: null`, `isInternal: false`, `color` from speaker palette).
4. For each existing row with a label no longer present in stats: DELETE.

This is **non-destructive for user edits** — `displayName`, `email`, `isInternal`, `avatarUrl` are preserved across re-runs.

## Relabeling

When a user renames `"Speaker 0"` to `"Alice"` in the UI, the mutation updates `call_participants.display_name` only. The transcript UI resolves labels at render time by joining segments against participants. This means:

- The transcript's segment.speakerLabel stays as `"Speaker 0"` forever.
- Recomputing stats on a relabeled call is still correct — stats key off speakerLabel.
- Cross-call identity is not maintained — each call's speakers start fresh.

If we want cross-call speaker identification, that's a future feature (voice embedding + workspace-level speaker roster). Not in the current model.

## Rules

- **`computeTalkStats` is the single source of truth.** Never inline-compute these numbers anywhere else.
- **Talk percentage denominator is total talk time, not call duration.** Otherwise silent / music calls produce weird percentages.
- **Interruption threshold is 200ms minimum overlap.** Smaller overlaps are diarization noise.
- **Monologue gap threshold is 1.5 seconds.** Tune this here, not in callers.
- **Materialization is idempotent and preserves user edits** (`displayName`, `email`, `isInternal`).
- **Participant avatars / colors are display only** — never gate access on them.

## Related skills

- `transcription` — `materializeParticipants` runs inside `request-transcript`.
- `call-search` — `list-calls --participantEmail=...` joins through `call_participants.email`.
- `call-summary` — the summary prompt references participant display names and `isInternal` flags.
- `trackers` — tracker hits carry `speaker_label` so you can compute per-speaker hit counts.
More from BuilderIO/agent-native