Intermediate

Human Handoff Map for Agent Workflows

Generate a documented list of every point in an agent-driven workflow where a human must review or approve before the next step runs.

When to use this prompt

Before deploying an agent that takes actions autonomously. The handoff map is the document that defines the boundaries of agent autonomy in your business. Without it, you are relying on the agent to stay inside boundaries you have not actually set.

Use this prompt when you describe a workflow at a high level (e.g., “the agent qualifies inbound leads”) and need to break that down into the specific decision points where a human must intervene. The output becomes part of your standard operating procedure for the agent and the team that supervises it.

The prompt

<role>Operations expert specializing in human-in-the-loop design for AI agent workflows.</role>

<task>Build a Human Handoff Map for the workflow below. Classify every step on the four-tier autonomy scale. Name the responsible human for every non-autonomous step. Identify failure modes and tracking metrics.</task>

<inputs>
<workflow>
[DESCRIBE THE WORKFLOW IN 3 TO 8 SENTENCES. Include: what triggers it, what actions the agent takes, what systems it touches, what the desired end state is.]
</workflow>
<industry>[INDUSTRY]</industry>
<company_size>[SIZE]</company_size>
<risk_tolerance>[LOW / MEDIUM / HIGH]</risk_tolerance>
</inputs>

<scale>
- AUTONOMOUS: agent acts without human review.
- REVIEW_BEFORE_SEND: agent prepares output, a named human reviews and approves before it goes out.
- APPROVAL_GATE: agent stops and waits for explicit human go-ahead before proceeding.
- HUMAN_OWNED: agent does not perform this step at all.
</scale>

<instructions>
1. Decompose the workflow into discrete steps from trigger to end state. Number them.
2. Classify each step using exactly one of the four labels in <scale>.
3. For every REVIEW_BEFORE_SEND, APPROVAL_GATE, or HUMAN_OWNED step, name the specific role (not "a human" or "the team").
4. If risk_tolerance is LOW, default any step with financial, legal, or customer-facing impact to APPROVAL_GATE or HUMAN_OWNED.
5. After the step list, identify the 3 highest-risk failure modes for this workflow. For each, name the specific step or handoff that mitigates it.
6. Propose 2 tracking metrics: one that detects "agent doing too much without oversight," one that detects "humans bottlenecking the agent." Do not skip the second metric.
</instructions>

<output_format>
**Steps:**
1. [Step] (CLASSIFICATION, owner: [Role if applicable])
2. ...

**Top 3 failure modes:**
1. [Failure] → mitigated by [step number / handoff]
2. ...
3. ...

**Tracking metrics:**
- [Bottleneck-detection metric]: [definition]
- [Over-autonomy-detection metric]: [definition]
</output_format>

How it works

The four classifications cover the full spectrum of agent autonomy. Every step in any agent workflow falls into one of these categories. Forcing the model to classify each step prevents the common mistake of treating “the agent does this” as a single state when it is actually four different states.

The XML <scale> block makes the four-tier definition reusable; you can swap the workflow without re-explaining the scale every time. Numbered, imperative instructions match how GPT-5.5 and Claude Opus 4.7 interpret instructions: literally.

The two-metric requirement at the end balances the map. Without a “humans bottlenecking” metric, teams over-correct toward review gates that grind throughput to zero. The instruction “do not skip the second metric” is a literal-instruction nudge that frontier models actually obey.

Example output

Workflow: Inbound lead qualification and follow-up

Steps:

  1. Lead form submission triggers agent (AUTONOMOUS)
  2. Agent enriches lead with firmographic data from Clearbit (AUTONOMOUS)
  3. Agent classifies lead as Tier 1, 2, or 3 based on rules (AUTONOMOUS)
  4. Agent drafts personalized first-touch email (REVIEW_BEFORE_SEND, owner: Account Executive)
  5. Agent sends approved email and logs in CRM (AUTONOMOUS)
  6. Agent waits for reply or 5 business days (AUTONOMOUS)
  7. If positive reply, agent proposes meeting times (REVIEW_BEFORE_SEND, owner: AE)
  8. If pricing question, agent stops and routes to AE (APPROVAL_GATE, owner: AE)
  9. Discount or pricing terms (HUMAN_OWNED, owner: AE)

Top 3 failure modes:

  1. Agent sends an email with a factual error → mitigated by step 4 review
  2. Agent advances a Tier 3 lead as Tier 1 → mitigated by step 4 review (AE catches misclassification)
  3. Agent commits to pricing terms → mitigated by step 8 approval gate

Metrics:

  • Average time from lead submission to first-touch send (detects review bottleneck)
  • Percentage of agent-drafted emails edited materially before send (detects agent doing too much without enough oversight)

Variations

  • High-risk industry version: Add a constraint that financial, legal, or healthcare actions automatically classify as APPROVAL_GATE. Useful for regulated industries.
  • Phased rollout version: Generate a “month one,” “month two,” and “month three” version of the same handoff map, gradually moving steps from APPROVAL_GATE to AUTONOMOUS as the team gains confidence.
  • Existing workflow audit: Paste a workflow that is already running and ask the model to identify every place a handoff is missing today.