design-agent

Name: design-agent
Author: crewAIInc/skills
$npx mdskill add crewAIInc/skills/design-agent
Design CrewAI agents with precise roles, goals, and configurations.
Optimize agent performance by defining clear roles, goals, and backstories.
Integrates with CrewAI framework for task orchestration and execution.
Selects appropriate LLMs, tools, and execution limits based on task needs.
Delivers optimized agent structures ready for deployment or debugging.
SKILL.md
.github/skills/design-agentView on GitHub ↗
---
name: design-agent
description: "CrewAI agent design and configuration. Use when creating, configuring, or debugging crewAI agents — choosing role/goal/backstory, selecting LLMs, assigning tools, tuning max_iter/max_rpm/max_execution_time, enabling planning/code execution/delegation, setting up knowledge sources, using guardrails, or configuring agents in YAML vs code."
---

# CrewAI Agent Design Guide

How to design effective agents with the right role, goal, backstory, tools, and configuration.

---

## The 80/20 Rule

**Spend 80% of your effort on task design, 20% on agent design.** A well-designed task elevates even a simple agent. But even the best agent cannot rescue a vague, poorly scoped task. Get the task right first (see the `design-task` skill), then refine the agent.

---

## 0. How Many Agents Do You Actually Need?

**Default to ONE agent.** Add more only when the task genuinely splits into work that requires:

- **Different tools or permissions** — e.g. one agent has Slack write access, another reads docs only.
- **Different personas the LLM must clearly switch between** — a writer's voice is not a researcher's voice.
- **Different LLMs** — a cheap model for mechanical steps, a stronger one for synthesis.
- **Different guardrails or output schemas** — separate agents make the contract per stage explicit.

**DO NOT add an agent just because the workflow has multiple steps.** A single agent can:
- Call multiple tools in sequence within one kickoff (search → scrape → summarize is one agent's loop).
- Produce structured multi-section output in one response.
- Iterate via its own tool-use loop without you orchestrating it as separate agents.

**Cost calculus:** every extra agent = at least one more LLM kickoff plus a context handoff. Splitting linear, single-persona work into multiple agents multiplies token cost and adds fragility for marginal quality wins.

### Anti-pattern: Sequential mechanical steps as separate agents

❌ Three agents for what is one researcher's job:
```python
source_finder = Agent(role="Finds URLs via Firecrawl search", tools=[firecrawl_search])
scraper       = Agent(role="Scrapes URLs via Firecrawl scrape", tools=[firecrawl_scrape])
writer        = Agent(role="Writes the report", ...)
```

✅ One researcher does the gathering loop; one writer synthesizes — two agents because the personas and LLMs genuinely differ:
```python
researcher = Agent(role="Web Researcher", tools=[firecrawl_search, firecrawl_scrape], llm="anthropic/claude-haiku-4-5")
writer     = Agent(role="Technical Report Writer",                                    llm="anthropic/claude-sonnet-4-6")
```
The researcher's task description tells it to search, then scrape, then return structured findings. One LLM loop, multiple tool calls.

### Anti-pattern: "Summarize then send" as two agents

❌ Two agents to read a string, summarize it, and post a Slack DM:
```python
summarizer       = Agent(role="Summarizer")
slack_messenger  = Agent(role="Slack Sender", apps=["slack"])
```

✅ One agent with the connector and a task that tells it to summarize on top, then DM:
```python
slack_dm_agent = Agent(
    role="Slack Reporter",
    goal="Post a Slack DM containing a one-paragraph summary plus the full markdown body.",
    apps=["slack"],
)
# Task: "Read the report below. Write a 2-3 sentence executive summary at the top.
#        Post a DM to {recipient_email} with the summary followed by the full body."
```

### Heuristic

> If two "agents" share the same persona, the same tool surface, and the same LLM, they are one agent with a longer task description.

### Once you've decided "one agent is enough"

Use `Agent.kickoff()` directly inside a Flow method — no `Crew`, no `Task` ceremony. The Flow owns sequencing and state; each step is a single agent kickoff. See **Section 4 — Agent.kickoff() — Direct Agent Execution** below for the full pattern, and the upstream docs at <https://docs.crewai.com/en/concepts/agents#direct-agent-interaction-with-kickoff>.

Quick shape:

```python
@listen(previous_step)
def my_step(self):
    agent = Agent(role="…", goal="…", backstory="…", tools=[...])
    result = agent.kickoff(
        messages=f"Use this prior step's output: {self.state.prior_field}",
        response_format=MyPydanticModel,  # optional
    )
    self.state.my_field = result.pydantic  # or result.raw
```

Reach for `Crew.kickoff()` *only* when a step genuinely benefits from multi-agent collaboration (delegation, hierarchical management, parallel specialists feeding one synthesis). For "one agent does one job", `Agent.kickoff()` inside a Flow listener is the right primitive.

Only after you've decided multi-agent is justified, read on for how to design each one.

---

## 1. The Role-Goal-Backstory Framework

Every agent needs three things: **who** it is, **what** it wants, and **why** it's qualified.

### Role — Who the Agent Is

The role defines the agent's area of expertise. **Be specific, not generic.**

| Bad | Good |
|---|---|
| `Researcher` | `Senior Data Researcher specializing in {topic}` |
| `Writer` | `Technical Blog Writer for developer audiences` |
| `Analyst` | `Financial Risk Analyst with regulatory compliance expertise` |

The role directly shapes how the LLM reasons. A "Senior Data Researcher" will produce different output than a "Research Assistant" even with the same task.

### Goal — What the Agent Wants

The goal is the agent's individual objective. It should be **outcome-focused with quality standards**.

| Bad | Good |
|---|---|
| `Do research` | `Uncover cutting-edge developments in {topic} and identify the top 5 trends with supporting evidence` |
| `Write content` | `Produce publication-ready technical articles that explain complex topics clearly for non-technical readers` |
| `Analyze data` | `Deliver actionable risk assessments with confidence levels and recommended mitigations` |

### Backstory — Why the Agent Is Qualified

The backstory establishes expertise, experience, values, and working style. It's the agent's "personality prompt."

```yaml
backstory: >
  You're a seasoned researcher with 15 years of experience in AI/ML.
  You're known for your ability to find obscure but relevant papers
  and synthesize complex findings into clear, actionable insights.
  You always cite your sources and flag uncertainty explicitly.
```

**What to include in a backstory:**
- Years/depth of experience
- Specific domain knowledge
- Working style and values (e.g., "always cites sources", "prefers concise output")
- Quality standards the agent holds itself to

**What NOT to include:**
- Implementation details (tools, models, config)
- Task-specific instructions (those go in the task description)
- Arbitrary personality traits that don't affect output quality

---

## 2. Agent Configuration Reference

### Essential Parameters

```python
Agent(
    role="...",              # Required: agent's expertise area
    goal="...",              # Required: what the agent aims to achieve
    backstory="...",         # Required: context and personality
    llm="openai/gpt-4o",    # Optional: defaults to OPENAI_MODEL_NAME env var or "gpt-4"
    tools=[...],             # Optional: list of tool instances
)
```

### Execution Control

```python
Agent(
    ...,
    max_iter=25,             # Max reasoning iterations per task (default: 25)
    max_execution_time=300,  # Timeout in seconds (default: None — no limit)
    max_rpm=10,              # Rate limit: max API calls per minute (default: None)
    max_retry_limit=2,       # Retries on error (default: 2)
    verbose=True,            # Show detailed execution logs (default: False)
)
```

**Tuning `max_iter`:**
- Default 25 is generous — most tasks finish in 3-8 iterations
- Lower to 10-15 to fail faster when tasks are well-defined
- If agent consistently hits max_iter, the task is too vague (fix the task, not the limit)

### Tool Configuration

```python
from crewai_tools import SerperDevTool, ScrapeWebsiteTool, FileReadTool

Agent(
    ...,
    tools=[SerperDevTool(), ScrapeWebsiteTool()],  # Agent-level tools
)
```

**Key rules:**
- An agent with **no tools** will hallucinate data when asked to search, fetch, or read files — always provide tools for tasks that require external data
- Prefer **fewer, focused tools** over many tools — too many tools confuses the agent
- Tools can also be assigned at the **task level** for task-specific access (see `design-task` skill)
- Agent-level tools are available for all tasks the agent performs; task-level tools override for that specific task

### LLM Selection

```python
Agent(
    ...,
    llm="openai/gpt-4o",              # Main reasoning model
    function_calling_llm="openai/gpt-4o-mini",  # Cheaper model for tool calls only
)
```

Use `function_calling_llm` to save costs: the main `llm` handles reasoning while a cheaper model handles tool-calling mechanics.

### Collaboration

```python
Agent(
    ...,
    allow_delegation=False,  # Default: False — agent works alone
)
```

Set `allow_delegation=True` only when:
- The agent is part of a crew with other specialized agents
- The task genuinely benefits from the agent handing off subtasks
- You're using hierarchical process where the manager delegates

**Warning:** Delegation without clear task boundaries leads to infinite loops or wasted iterations.

### Planning (Plan-and-Execute Mode)

When a `PlanningConfig` is set on an agent, `Agent.kickoff()` (and `Agent.execute_task()`) routes through the new `crewai.experimental.AgentExecutor`. Instead of a single ReAct-style loop, the agent:

1. **Generates a plan** — a list of `PlanStep`s, each with a description and optional `tool_to_use`. Stored as `state.todos`.
2. **Executes each step** via a `StepExecutor` in an isolated multi-turn LLM loop (capped by `max_step_iterations`).
3. **Observes the result** via a `PlannerObserver` after every step — did the step succeed? Is the remaining plan still valid?
4. **Routes the next action** based on the agent's `reasoning_effort` setting (see below).

The presence of a `PlanningConfig` enables the mode. To disable: don't pass one, or set `planning=False`.

```python
from crewai import Agent
from crewai.agent.planning_config import PlanningConfig

agent = Agent(
    role="…",
    goal="…",
    backstory="…",
    tools=[...],
    planning_config=PlanningConfig(reasoning_effort="medium"),  # most common
)
```

#### `reasoning_effort` — pick one

| Level | After each step the planner... | Pick when |
|---|---|---|
| `"low"` | observes (validates success), marks the todo complete, continues. **No replan, no refine.** | You want plan visibility (todos, observations) but trust the agent to follow it linearly. Fastest. |
| `"medium"` (default) | observes; **replans on failure only**. Successful steps just continue. | The agent's tools can fail (network, exec, scrape) and you want graceful recovery without paying refinement cost on every success. **The right default for sandbox-coding, research, and other tool-heavy loops.** |
| `"high"` | observes, then routes through `decide_next_action` which can trigger early goal achievement, full replan, or lightweight refinement after every step. | The task changes shape based on intermediate findings, or you need maximum adaptiveness. Most LLM calls per run. |

Source: `crewai/experimental/agent_executor.py:450` (`observe_step_result` router) and `crewai/agent/planning_config.py`.

#### Other `PlanningConfig` knobs

```python
PlanningConfig(
    reasoning_effort="medium",
    max_steps=20,            # cap on planned steps (default 20)
    max_replans=3,           # max full re-plans before finalizing (default 3)
    max_attempts=None,       # planning refinement attempts during plan generation
    max_step_iterations=15,  # max LLM turns per step's StepExecutor (default 15)
    step_timeout=None,       # wall-clock seconds per step; None = no cap
    system_prompt=None,      # custom planning system prompt (uses default if None)
    plan_prompt=None,        # custom initial-plan prompt; placeholders: {description}, {expected_output}, {tools}, {max_steps}
    refine_prompt=None,      # custom refinement prompt
    llm=None,                # separate LLM for planning (else uses agent.llm)
)
```

Use `llm="anthropic/claude-haiku-4-5"` (cheap) for the planner while keeping `agent.llm="anthropic/claude-opus-4-7"` (strong) for execution — common cost optimization.

#### When to enable

- **Enable** for autonomous loops where the agent picks its own steps and you want failure recovery (e.g. coding agent that writes → runs → patches; research agent that searches → scrapes → revises).
- **Skip** for single-tool, single-purpose calls (e.g. "summarize this string", "post this Slack DM") — observation overhead doesn't pay off.

#### Cost shape

Every step gets a `PlannerObserver` LLM call (~1 extra call per step). On `"medium"` a failed step adds a replan call. On `"high"` every step adds a `decide_next_action` call too. For an N-step plan, expect roughly:

- `low`: N execution + N observation = **2N calls**
- `medium`: 2N + (failures × 1 replan)
- `high`: ~3N + replans/refines

Material at scale — measure before defaulting `high` for everything.

#### Custom `plan_prompt`

If you supply `plan_prompt`, include the placeholders the planner template expects: `{description}`, `{expected_output}`, `{tools}`, `{max_steps}`. The planner LLM gets these interpolated. Keep custom prompts focused on *project-specific* rules; let `description`/`tools` (auto-injected) carry the dynamic content.

### Code Execution

```python
Agent(
    ...,
    allow_code_execution=True,        # Enable code execution (default: False)
    code_execution_mode="safe",       # "safe" (Docker) or "unsafe" (direct) — default: "safe"
)
```

- `"safe"` requires Docker installed and running — executes in a container
- `"unsafe"` runs code directly on the host — only use in controlled environments

### Context Window Management

```python
Agent(
    ...,
    respect_context_window=True,      # Auto-summarize to stay within limits (default: True)
)
```

When `True`, the agent automatically summarizes prior context if it approaches the LLM's token limit. When `False`, execution stops with an error on overflow.

### Date Injection

```python
Agent(
    ...,
    inject_date=True,                 # Add current date to task context (default: False)
    date_format="%Y-%m-%d",           # Date format (default: "%Y-%m-%d")
)
```

Enable for time-sensitive tasks (research, news analysis, scheduling).

### Agent Guardrails

```python
def validate_no_pii(result) -> tuple[bool, Any]:
    """Reject output containing PII."""
    if contains_pii(result.raw):
        return (False, "Output contains PII. Remove all personal information and try again.")
    return (True, result)

Agent(
    ...,
    guardrail=validate_no_pii,
    guardrail_max_retries=3,          # default: 3
)
```

Agent guardrails validate every output the agent produces. The agent retries on failure up to `guardrail_max_retries`.

### Knowledge Sources

```python
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource

Agent(
    ...,
    knowledge_sources=[
        TextFileKnowledgeSource(file_paths=["company_handbook.txt"]),
    ],
    embedder={
        "provider": "openai",
        "config": {"model": "text-embedding-3-small"},
    },
)
```

Knowledge sources give agents access to domain-specific data via RAG. Use when agents need to reference large documents, policies, or datasets.

---

## 3. YAML Configuration (Recommended)

Define agents in `agents.yaml` for clean separation of config and code:

```yaml
researcher:
  role: >
    {topic} Senior Data Researcher
  goal: >
    Uncover cutting-edge developments in {topic}
    with supporting evidence and source citations
  backstory: >
    You're a seasoned researcher with 15 years of experience.
    Known for finding obscure but relevant sources and
    synthesizing complex findings into clear insights.
    You always cite your sources and flag uncertainty.
  # Optional overrides (uncomment as needed):
  # llm: openai/gpt-4o
  # max_iter: 15
  # max_rpm: 10
  # allow_delegation: false
  # verbose: true
```

Then wire in `crew.py`:

```python
@CrewBase
class MyCrew:
    agents_config = "config/agents.yaml"
    tasks_config = "config/tasks.yaml"

    @agent
    def researcher(self) -> Agent:
        return Agent(
            config=self.agents_config["researcher"],
            tools=[SerperDevTool()],
        )
```

**Critical:** The method name (`def researcher`) must match the YAML key (`researcher:`). Mismatch causes `KeyError`.

---

## 4. Agent.kickoff() — Direct Agent Execution

Use `Agent.kickoff()` when you need one agent with tools and reasoning, without crew overhead. This is the most common pattern in Flows.

### Basic Usage

```python
from crewai import Agent
from crewai_tools import SerperDevTool

researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive, factual information with source citations",
    backstory="Expert researcher known for thorough, evidence-based analysis.",
    tools=[SerperDevTool()],
    llm="openai/gpt-4o",
)

# Pass a string prompt — the agent reasons, uses tools, and returns a result
result = researcher.kickoff("What are the latest developments in quantum computing?")
print(result.raw)             # str — the agent's full response
print(result.usage_metrics)   # token usage stats
```

### With Structured Output

```python
from pydantic import BaseModel

class ResearchFindings(BaseModel):
    key_trends: list[str]
    sources: list[str]
    confidence: float

result = researcher.kickoff(
    "Research the latest AI agent frameworks",
    response_format=ResearchFindings,
)

# Access via .pydantic (NOT directly — Agent.kickoff wraps the result)
print(result.pydantic.key_trends)    # list[str]
print(result.pydantic.confidence)    # float
print(result.raw)                    # raw string version
```

> **Note:** `Agent.kickoff()` returns `LiteAgentOutput` — access structured output via `result.pydantic`. This differs from `LLM.call()` which returns the Pydantic object directly.

### With File Inputs

```python
result = researcher.kickoff(
    "Analyze this document and summarize the key findings",
    input_files={"document": FileInput(path="report.pdf")},
)
```

### Async Variant

```python
result = await researcher.kickoff_async(
    "Research quantum computing breakthroughs",
    response_format=ResearchFindings,
)
```

### Agent.kickoff() in Flows (Recommended Pattern)

The most powerful pattern is orchestrating multiple `Agent.kickoff()` calls inside a Flow. The Flow handles state and sequencing; each agent handles its specific step:

```python
from crewai import Agent
from crewai.flow.flow import Flow, listen, start
from crewai_tools import SerperDevTool, ScrapeWebsiteTool
from pydantic import BaseModel

class ResearchState(BaseModel):
    topic: str = ""
    research: str = ""
    analysis: str = ""
    report: str = ""

class ResearchFlow(Flow[ResearchState]):

    @start()
    def gather_data(self):
        researcher = Agent(
            role="Senior Researcher",
            goal="Find comprehensive data with sources",
            backstory="Expert at finding and validating information.",
            tools=[SerperDevTool(), ScrapeWebsiteTool()],
        )
        result = researcher.kickoff(f"Research: {self.state.topic}")
        self.state.research = result.raw

    @listen(gather_data)
    def analyze(self):
        analyst = Agent(
            role="Data Analyst",
            goal="Extract actionable insights from raw research",
            backstory="Skilled at pattern recognition and synthesis.",
        )
        result = analyst.kickoff(
            f"Analyze this research and extract key insights:\n\n{self.state.research}"
        )
        self.state.analysis = result.raw

    @listen(analyze)
    def write_report(self):
        writer = Agent(
            role="Report Writer",
            goal="Create clear, well-structured reports",
            backstory="Technical writer who makes complex topics accessible.",
        )
        result = writer.kickoff(
            f"Write a comprehensive report from this analysis:\n\n{self.state.analysis}"
        )
        self.state.report = result.raw

flow = ResearchFlow()
flow.kickoff(inputs={"topic": "AI agents"})
print(flow.state.report)
```

**When to use Agent.kickoff() vs Crew.kickoff():**
- Use `Agent.kickoff()` when each step is a distinct agent and the Flow controls sequencing
- Use `Crew.kickoff()` when multiple agents need to collaborate on related tasks within a single step

---

## 5. Specialist vs Generalist Agents

> **Note:** Apply this section *after* you've decided you genuinely need multiple agents (see Section 0). If you only need one agent, "specialist vs generalist" is not the question — the question is just how to design that one agent.

**When you do need multiple agents, prefer specialists.** An agent that does one thing well outperforms one that does many things acceptably.

### When to Use a Specialist

- Task requires deep domain knowledge
- Output quality matters more than speed
- The task is complex enough to benefit from focused expertise

### When a Generalist Is Acceptable

- Simple tasks with clear instructions
- Prototyping where you'll specialize later
- Tasks that truly span multiple domains equally

### Specialist Design Pattern

Instead of one "Content Writer" agent, create:
- `technical_writer` — deep technical accuracy, code examples
- `copywriter` — persuasive, audience-focused marketing copy
- `editor` — grammar, consistency, style guide enforcement

Each specialist has a narrow role, specific goal, and backstory that reinforces their expertise.

---

## 6. Agent Interaction Patterns

### Sequential (Default)

Agents work one after another. Each agent receives prior agents' outputs as context.

```
Researcher → Writer → Editor
```

Best for: linear pipelines where each step builds on the last.

### Hierarchical

A manager agent delegates and validates. Task assignment is dynamic.

```python
Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, writing_task, editing_task],
    process=Process.hierarchical,
    manager_llm="openai/gpt-4o",
)
```

Best for: complex workflows where task assignment depends on intermediate results.

### Agent-to-Agent Delegation

When `allow_delegation=True`, an agent can ask another crew agent for help:

```python
lead_researcher = Agent(
    role="Lead Researcher",
    goal="Coordinate research efforts",
    backstory="...",
    allow_delegation=True,  # Can delegate to other agents in the crew
)
```

The agent will automatically discover other crew members and delegate subtasks as needed.

---

## 7. Common Agent Design Mistakes

| Mistake | Impact | Fix |
|---|---|---|
| Generic role like "Assistant" | Agent produces unfocused, shallow output | Use specific expertise: "Senior Financial Analyst" |
| No tools for data-gathering tasks | Agent hallucinates data instead of searching | Always add tools when the task requires external info |
| Too many tools (10+) | Agent gets confused choosing between tools | Limit to 3-5 relevant tools per agent |
| Backstory full of task instructions | Agent mixes personality with task execution | Keep backstory about WHO the agent is; task details go in the task |
| `allow_delegation=True` by default | Agents waste iterations delegating trivially | Only enable when delegation genuinely helps |
| max_iter too high for simple tasks | Agent loops unnecessarily on vague tasks | Lower max_iter; fix the task description instead |
| No guardrail on critical output | Bad output passes through unchecked | Add guardrails for outputs that feed into production systems |
| Using expensive LLM for tool calls | Unnecessary cost for mechanical operations | Set `function_calling_llm` to a cheaper model |

---

## 8. Agent Design Checklist

Before deploying an agent, verify:

- [ ] **Role** is specific and domain-focused (not "Assistant" or "Helper")
- [ ] **Goal** includes desired outcome AND quality standards
- [ ] **Backstory** establishes expertise and working style
- [ ] **Tools** are assigned for any task requiring external data
- [ ] **No excess tools** — 3-5 per agent maximum
- [ ] **max_iter** is tuned for expected task complexity (10-15 for simple, 20-25 for complex)
- [ ] **max_execution_time** is set for production agents to prevent hangs
- [ ] **Guardrails** are configured for critical outputs
- [ ] **LLM** is appropriate for task complexity (don't use GPT-4 for classification)
- [ ] **Delegation** is disabled unless genuinely needed

---

## References

For deeper dives into specific topics, see:

- [Custom Tools](references/custom-tools.md) — building your own tools with `@tool` decorator and `BaseTool` subclass
- [Memory & Knowledge](references/memory-and-knowledge.md) — memory configuration, knowledge sources, embedder setup, scoping

For related skills:

- **getting-started** — project scaffolding, choosing the right abstraction, Flow architecture
- **design-task** — task description/expected_output best practices, guardrails, structured output, dependencies
- **ask-docs** — query the live CrewAI documentation MCP server for questions not covered by these skills