Technical safety approval autonomy

Human-in-the-Loop

Patterns for involving humans in agent workflows: approval gates, progressive autonomy, and knowing when to escalate.

Agents are powerful, but they’re not infallible. They misinterpret instructions, make bad assumptions, and sometimes take actions that are difficult or impossible to undo. Human-in-the-loop patterns keep a person involved at critical decision points, providing a safety net without killing the productivity gains that agents offer.

The goal isn’t to require human approval for every action. That defeats the purpose of automation. The goal is to have the human involved at the right moments: before irreversible actions, when the agent is uncertain, and when the stakes are high.

Approval gates

An approval gate is a point in a workflow where execution pauses until a human explicitly approves the next action. This is the most basic human-in-the-loop pattern.

When to use approval gates

Not every action needs one. Use them when:

  • The action is irreversible, like deleting data, deploying to production, or sending an email to a customer
  • The action has real cost, like spinning up expensive infrastructure or making paid API calls
  • The action affects other people, like posting to a shared channel, modifying shared resources, or creating public content
  • The agent’s confidence is low, meaning the skill detected ambiguity or uncertainty in its inputs

Pattern: simple approval gate

interface ApprovalRequest {
  action: string;
  description: string;
  details: Record<string, unknown>;
  risk: "low" | "medium" | "high";
  reversible: boolean;
}

async function withApproval(
  request: ApprovalRequest,
  action: () => Promise<unknown>,
): Promise<{ approved: boolean; result?: unknown }> {
  // Present the proposed action to the user
  const approval = await requestHumanApproval({
    message: `The agent wants to: ${request.description}`,
    details: request.details,
    risk: request.risk,
    reversible: request.reversible,
  });

  if (!approval.granted) {
    return {
      approved: false,
      result: {
        skipped: true,
        reason: approval.reason || "User declined",
      },
    };
  }

  const result = await action();
  return { approved: true, result };
}

Pattern: batched approvals

When a workflow needs to perform many similar actions, asking for approval on each one gets tedious fast. Instead, present a batch for review.

async def batch_file_changes(changes: list[FileChange]) -> list[FileChange]:
    """Present a batch of proposed file changes for approval."""
    summary = {
        "total_files": len(changes),
        "files_created": len([c for c in changes if c.type == "create"]),
        "files_modified": len([c for c in changes if c.type == "modify"]),
        "files_deleted": len([c for c in changes if c.type == "delete"]),
        "changes": [
            {
                "path": c.path,
                "type": c.type,
                "summary": c.summary,
            }
            for c in changes
        ],
    }

    approval = await request_human_approval(
        message=f"The agent wants to modify {len(changes)} files.",
        details=summary,
        options=["approve_all", "review_each", "reject_all"],
    )

    if approval.choice == "approve_all":
        return changes
    elif approval.choice == "review_each":
        approved = []
        for change in changes:
            individual = await request_human_approval(
                message=f"{change.type}: {change.path}",
                details={"diff": change.diff},
            )
            if individual.granted:
                approved.append(change)
        return approved
    else:
        return []

This gives the user three options: trust the agent fully, review each change individually, or reject everything. In practice, most users choose “approve all” or “reject all,” and the per-item review option is there for when they need it.

Progressive autonomy levels

Not all agents and not all users should have the same level of autonomy. Progressive autonomy means starting with tight human oversight and loosening it as trust builds.

Defining autonomy levels

enum AutonomyLevel {
  /** Every action requires approval */
  SUPERVISED = "supervised",
  /** Only destructive or high-risk actions require approval */
  GUIDED = "guided",
  /** Agent acts freely but reports what it did */
  AUTONOMOUS = "autonomous",
  /** Agent acts freely with minimal reporting */
  FULL_AUTO = "full_auto",
}

interface AutonomyPolicy {
  level: AutonomyLevel;
  alwaysRequireApproval: string[]; // Action types that always need approval
  neverRequireApproval: string[]; // Action types that never need approval
}

function needsApproval(
  action: string,
  risk: string,
  policy: AutonomyPolicy,
): boolean {
  // Some actions always require approval regardless of level
  if (policy.alwaysRequireApproval.includes(action)) return true;
  // Some actions never require approval
  if (policy.neverRequireApproval.includes(action)) return false;

  switch (policy.level) {
    case AutonomyLevel.SUPERVISED:
      return true;
    case AutonomyLevel.GUIDED:
      return risk === "high" || risk === "medium";
    case AutonomyLevel.AUTONOMOUS:
      return risk === "high";
    case AutonomyLevel.FULL_AUTO:
      return false;
  }
}

Practical autonomy configuration

A real-world autonomy policy might look like this:

Action categorySupervisedGuidedAutonomousFull auto
Read filesApproveAllowAllowAllow
Edit filesApproveAllowAllowAllow
Delete filesApproveApproveApproveAllow
Run testsApproveAllowAllowAllow
Run arbitrary commandsApproveApproveAllowAllow
Deploy to stagingApproveApproveAllowAllow
Deploy to productionApproveApproveApproveApprove
Send external communicationsApproveApproveApproveApprove

Notice that even at the highest autonomy level, some actions (production deployments, external communications) still need approval. That’s intentional. Some actions are too consequential to ever fully hand off to an agent.

Feedback collection and incorporation

Human-in-the-loop isn’t just about approval gates. It’s also about learning from human feedback to improve future behavior.

Pattern: outcome feedback

After the agent completes a task, ask whether the outcome was correct. This feedback can shape future behavior.

async def complete_task_with_feedback(task: Task, result: TaskResult) -> None:
    """Present results and collect feedback."""
    feedback = await request_feedback(
        message="The agent completed the task. How did it do?",
        result_summary=result.summary,
        options={
            "correct": "The result is correct",
            "partially_correct": "Mostly right but needs adjustment",
            "incorrect": "The result is wrong",
        },
    )

    if feedback.choice == "correct":
        await store_positive_example(task, result)
    elif feedback.choice == "partially_correct":
        correction = await request_input("What needs to be adjusted?")
        await store_correction(task, result, correction)
    elif feedback.choice == "incorrect":
        explanation = await request_input("What went wrong?")
        await store_negative_example(task, result, explanation)

Pattern: inline corrections

Let humans correct the agent mid-workflow rather than waiting until the end. This is especially useful in multi-step workflows where an early mistake compounds through later steps.

async function editWithCorrections(
  filePath: string,
  proposedEdit: string,
): Promise<string> {
  const preview = await generateDiff(filePath, proposedEdit);

  const response = await requestHumanInput({
    message: "Review the proposed edit:",
    diff: preview,
    options: ["accept", "modify", "reject"],
  });

  switch (response.choice) {
    case "accept":
      return proposedEdit;
    case "modify":
      // Human provides a corrected version
      return response.modifiedContent;
    case "reject":
      // Return the original, unchanged
      return await readFile(filePath);
  }
}

When to escalate vs. proceed

One of the hardest design decisions is knowing when the agent should stop and ask for help versus pushing forward on its own. Here are concrete signals that should trigger escalation.

Escalation signals

Ambiguous instructions. If the user’s request can be read multiple ways and those interpretations lead to materially different actions, stop and ask.

async def handle_ambiguity(
    interpretations: list[str],
    context: str,
) -> str:
    """When multiple valid interpretations exist, ask the human."""
    if len(interpretations) == 1:
        return interpretations[0]

    choice = await request_human_input(
        message="I can interpret this request in multiple ways:",
        options={str(i): interp for i, interp in enumerate(interpretations)},
        context=context,
    )
    return interpretations[int(choice)]

Repeated failures. If a skill has failed and been retried more than twice, escalate rather than continuing to retry. Endless retries waste time and context. See Error Handling Patterns for retry strategies that know when to quit.

Conflicting information. When different sources disagree (documentation says one thing, code says another), flag the conflict and let the human decide which source to trust.

Exceeding scope. If completing the task requires actions outside the agent’s defined permissions or capabilities, say so explicitly rather than attempting a workaround.

Decision framework

Use this when designing escalation logic for your skills:

ConditionAction
Clear instructions, low risk, reversibleProceed autonomously
Clear instructions, high risk or irreversibleApproval gate
Ambiguous instructions, low riskProceed with best interpretation, report what you did
Ambiguous instructions, high riskEscalate for clarification
Repeated failures (3+)Escalate with failure details
Conflicting informationEscalate with both sources
Outside defined scopeEscalate, explain the limitation

Designing for trust

The point of human-in-the-loop patterns is to build trust between humans and agents. Trust grows when:

  • The agent is transparent about what it’s doing and why
  • The agent asks before acting when the stakes are high
  • The agent reports accurately on what it did, including mistakes
  • The agent respects boundaries and doesn’t try to circumvent approval gates
  • The agent gets better over time by incorporating feedback

Trust erodes when agents act without asking, hide mistakes, or overstate their confidence. When designing skills, err on the side of more transparency and more opportunities for human input. You can always reduce the friction later as trust builds, but regaining lost trust is much harder than earning it gradually.

This connects directly to avoiding anti-patterns like ignoring error cases or building overly autonomous skills that bypass user oversight.

Key takeaways

  1. Use approval gates before irreversible or high-cost actions. This is non-negotiable for production agent systems.

  2. Batch similar approvals. Asking for approval on 50 individual file edits creates fatigue. Present them as a group with the option to drill down.

  3. Implement progressive autonomy. Start supervised, graduate to guided, and let trust build over time.

  4. Collect feedback and use it. Outcome feedback, inline corrections, and explicit ratings all help the agent improve.

  5. Know when to stop. Ambiguity, repeated failures, conflicting information, and scope violations are all signals to escalate rather than press forward.