AI agent orchestration patterns: fan-out, pipelines, and supervisor agents

A single tool does a single thing: read a file, search a codebase, format a response. That works for simple interactions, but real work rarely fits inside one tool call. A code review touches file reading, linting, test coverage analysis, and summary generation. A customer onboarding flow walks through data validation, account provisioning, notification sending, and documentation creation. These are multi-tool problems, and they need orchestration.

A quick note on terminology before the patterns. Tools are the callable functions an agent invokes (the leaf operations like read_file or gh pr view). Skills are the markdown recipes that tell an agent which tools to call and in what order to solve a particular problem; see Skill composition for how skills relate to each other. Orchestration sits below skills: it’s the code-level coordination layer that handles sequencing, parallelism, error recovery, and result aggregation across tool calls. This article covers four patterns for that orchestration layer.

If you haven’t read Skill composition yet, start there. It covers the foundations (dependency resolution, shared context, composability principles) that this article builds on.

Why single-tool execution breaks down

Consider a task: “Review this pull request and post a summary comment.”

An agent might try to do this in a single pass, reading the diff, checking for issues, and writing a comment all within one long generation. That works for small diffs. It falls apart for anything real because:

The context window fills up with file contents, leaving little room for analysis. Context length doesn’t scale linearly with quality; the analysis gets worse, not just slower.
There’s no separation between data gathering and decision making.
A failure in one step (the linter tool times out) kills the entire operation.
You can’t parallelize anything since it’s one sequential generation.

Orchestration solves these problems by decomposing the task into discrete tool invocations with explicit data flow between them. The orchestrator handles sequencing, parallelism, error recovery, and result aggregation. The individual tools stay simple and focused.

Pattern 1: fan-out/fan-in

The fan-out/fan-in pattern sends the same type of work to multiple tool calls in parallel, then collects and combines the results.

When to use it: you have a collection of items that all need the same processing. File-by-file analysis. Per-user notifications. Batch data validation.

interface FanOutResult<TInput, TResult> {
  successes: { input: TInput; result: TResult }[];
  failures: { input: TInput; error: Error }[];
}

async function fanOutFanIn<TInput, TResult>(
  items: TInput[],
  tool: (item: TInput) => Promise<TResult>,
  options: {
    concurrency?: number;
    continueOnError?: boolean;
  } = {},
): Promise<FanOutResult<TInput, TResult>> {
  const { concurrency = 5, continueOnError = true } = options;
  const results: FanOutResult<TInput, TResult> = {
    successes: [],
    failures: [],
  };

  // Process items in batches to control concurrency
  for (let i = 0; i < items.length; i += concurrency) {
    const batch = items.slice(i, i + concurrency);

    const settled = await Promise.allSettled(batch.map((item) => tool(item)));

    // Promise.allSettled preserves input order; use the index, not indexOf,
    // so duplicate inputs don't collide.
    settled.forEach((outcome, idx) => {
      const input = batch[idx];
      if (outcome.status === "fulfilled") {
        results.successes.push({ input, result: outcome.value });
      } else {
        const error =
          outcome.reason instanceof Error
            ? outcome.reason
            : new Error(String(outcome.reason));
        results.failures.push({ input, error });
        if (!continueOnError) {
          throw error;
        }
      }
    });
  }

  return results;
}

Using this for a code review scenario:

// invoke() represents your framework's tool-calling function (the
// Anthropic SDK's tool_use response handler, OpenAI's function call,
// or your in-house equivalent).
const changedFiles = await invoke("get_pr_diff", { prNumber: 42 });

const analysisResults = await fanOutFanIn(
  changedFiles,
  (file) => invoke("analyze_file", { path: file.path, diff: file.diff }),
  { concurrency: 3 },
);

// Fan in: aggregate results into a single review
const review = await invoke("generate_review_summary", {
  analyses: analysisResults.successes.map((s) => s.result),
  failedFiles: analysisResults.failures.map((f) => f.input),
});

The key design decision in fan-out/fan-in is error handling. Should one failed file analysis block the entire review? Usually not. The continueOnError flag lets you collect partial results and note which items failed. The fan-in step then decides what to do with incomplete data.

For visibility into which items failed and why, fold the trace into your structured logs. See observability for agents for the patterns; partial-failure modes are exactly the case where good logging earns its keep.

Pattern 2: supervisor agents

A supervisor agent sits above a set of tools and decides at runtime which ones to invoke, in what order, and with what parameters. Unlike the other patterns here, the supervisor uses an LLM to make routing decisions rather than following a predetermined execution plan.

The most common mistake I see is reaching for supervisor agents too early. They’re flexible but expensive (every routing decision costs an LLM call) and hard to test (the LLM might choose different paths on different runs). A well-designed pipeline or parallel-execution pattern handles 80% of orchestration needs with far less complexity. Default to pipelines and only graduate to supervisors when the execution path genuinely depends on intermediate results.

When to use it: the execution path depends on the content of intermediate results. You can’t know in advance which tools you’ll need.

interface SupervisorConfig {
  availableTools: ToolDefinition[];
  maxSteps: number;
  systemPrompt: string;
}

interface ToolDefinition {
  name: string;
  description: string;
  parameterSchema: Record<string, unknown>;
}

interface SupervisorStep {
  reasoning: string;
  toolName: string;
  parameters: Record<string, unknown>;
  result: unknown;
}

async function runSupervisor(
  config: SupervisorConfig,
  task: string,
): Promise<{ steps: SupervisorStep[]; finalResult: unknown }> {
  const steps: SupervisorStep[] = [];
  let context: Record<string, unknown> = {};

  for (let i = 0; i < config.maxSteps; i++) {
    const decision = await planNextStep({
      systemPrompt: config.systemPrompt,
      task,
      availableTools: config.availableTools,
      previousSteps: steps,
      currentContext: context,
    });

    if (decision.action === "complete") {
      return { steps, finalResult: decision.result };
    }

    const result = await invoke(decision.toolName, decision.parameters);

    steps.push({
      reasoning: decision.reasoning,
      toolName: decision.toolName,
      parameters: decision.parameters,
      result,
    });

    context[`step_${i}_${decision.toolName}`] = result;

    // Past about 10 steps, the accumulator dominates input cost. Replace
    // the raw context with a structured summary so the supervisor isn't
    // paying to re-read the entire history on every routing call.
    if (steps.length > 0 && steps.length % 10 === 0) {
      context = await summarizeContext(context, steps);
    }
  }

  throw new Error(`Supervisor exceeded max steps (${config.maxSteps})`);
}

The planNextStep function is where the LLM does its work. With the Anthropic SDK, it looks like:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function planNextStep(args: {
  systemPrompt: string;
  task: string;
  availableTools: ToolDefinition[];
  previousSteps: SupervisorStep[];
  currentContext: Record<string, unknown>;
}) {
  const response = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    system: args.systemPrompt,
    tools: args.availableTools.map((t) => ({
      name: t.name,
      description: t.description,
      input_schema: t.parameterSchema as Anthropic.Tool.InputSchema,
    })),
    messages: [
      {
        role: "user",
        content: `Task: ${args.task}\n\nContext so far:\n${JSON.stringify(args.currentContext, null, 2)}`,
      },
    ],
  });

  // If the model returned a tool_use block, that's the next step.
  const toolUse = response.content.find((b) => b.type === "tool_use");
  if (toolUse && toolUse.type === "tool_use") {
    const reasoningBlock = response.content.find((b) => b.type === "text");
    return {
      action: "invoke" as const,
      toolName: toolUse.name,
      parameters: toolUse.input as Record<string, unknown>,
      reasoning:
        reasoningBlock && reasoningBlock.type === "text"
          ? reasoningBlock.text
          : "",
    };
  }

  // No tool_use means the model thinks the task is done. The text content
  // is the final answer.
  const textBlock = response.content.find((b) => b.type === "text");
  return {
    action: "complete" as const,
    result: textBlock && textBlock.type === "text" ? textBlock.text : "",
  };
}

The structured tools array tells Claude which tools are available and what shape each tool’s input takes. The model picks one (or returns text and stops). This is the same tool_use mechanism used in any Anthropic agent, just wrapped in a routing loop.

A practical safeguard: always set maxSteps to a reasonable limit. Without it, a confused supervisor can loop indefinitely, calling tools that don’t make progress. I’ve seen supervisors burn through 30+ steps on a task that should take 4 because they kept retrying a tool with slightly different parameters instead of recognizing the approach wasn’t working. For approval-gate patterns that catch this earlier (the supervisor proposes the next tool call but a human confirms), see human-in-the-loop patterns.

For more on preventing runaway agent behavior generally, see error handling in agent skills.

Pattern 3: tool pipelines

A pipeline chains tool calls together sequentially, where the output of one tool becomes the input of the next. This is the simplest orchestration pattern and the one you should reach for first.

When to use it: the processing steps are known in advance and each step transforms the data for the next one.

type ToolFn = (input: unknown) => Promise<unknown>;

interface PipelineStage {
  name: string;
  tool: ToolFn;
  transform?: (output: unknown) => unknown;
  onError?: "skip" | "abort" | "retry";
  retryCount?: number;
}

interface PipelineTrace {
  stage: string;
  inputRef: string; // store a reference, not the full input, for large payloads
  output: unknown;
  durationMs: number;
  status: "success" | "error";
  error?: string;
}

async function runPipeline(
  stages: PipelineStage[],
  initialInput: unknown,
): Promise<{ result: unknown; trace: PipelineTrace[] }> {
  let current = initialInput;
  const trace: PipelineTrace[] = [];

  for (const stage of stages) {
    const start = Date.now();

    try {
      const output = await executeWithRetry(
        () => stage.tool(current),
        stage.retryCount ?? 0,
      );

      const transformed = stage.transform ? stage.transform(output) : output;

      trace.push({
        stage: stage.name,
        inputRef: refOrId(current),
        output: transformed,
        durationMs: Date.now() - start,
        status: "success",
      });

      current = transformed;
    } catch (error) {
      trace.push({
        stage: stage.name,
        inputRef: refOrId(current),
        output: null,
        durationMs: Date.now() - start,
        status: "error",
        error: String(error),
      });

      if (stage.onError === "skip") continue;
      if (stage.onError === "abort" || !stage.onError) throw error;
    }
  }

  return { result: current, trace };
}

async function executeWithRetry(
  fn: () => Promise<unknown>,
  retries: number,
  options: { initialDelayMs?: number; maxDelayMs?: number } = {},
): Promise<unknown> {
  const initial = options.initialDelayMs ?? 100;
  const max = options.maxDelayMs ?? 5_000;
  for (let attempt = 0; attempt <= retries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === retries) throw error;
      // Exponential backoff with jitter avoids thundering-herd retries
      // against the same upstream rate limit.
      const delay = Math.min(initial * 2 ** attempt, max);
      const jittered = delay * (0.5 + Math.random() * 0.5);
      await new Promise((r) => setTimeout(r, jittered));
    }
  }
  throw new Error("Unreachable");
}

function refOrId(value: unknown): string {
  if (typeof value === "string" && value.length > 200)
    return `string(${value.length})`;
  if (Array.isArray(value)) return `array(${value.length})`;
  if (typeof value === "object" && value !== null) return "object";
  return String(value);
}

A content publishing pipeline in practice:

const publishPipeline: PipelineStage[] = [
  {
    name: "fetch_draft",
    tool: (input) => invoke("read_file", { path: input }),
  },
  {
    name: "check_grammar",
    tool: (input) => invoke("grammar_check", { text: input }),
    onError: "skip", // Grammar check is nice-to-have
  },
  {
    name: "generate_metadata",
    tool: (input) => invoke("extract_metadata", { content: input }),
    transform: (output: any) => ({
      content: output.content,
      title: output.title,
      tags: output.tags,
    }),
  },
  {
    name: "publish",
    tool: (input) => invoke("publish_to_cms", input),
    onError: "retry",
    retryCount: 2,
  },
];

const result = await runPipeline(publishPipeline, "./drafts/new-post.md");

The transform function between stages is important. Tools rarely produce output in exactly the format the next tool expects. Rather than coupling the tools to each other’s interfaces, use transforms to adapt the data. This keeps each tool reusable in other pipelines.

Two correctness notes worth flagging.

First, idempotency. Pipeline retry only works cleanly when stages can run twice without side effects. If the publish step succeeds and the pipeline then crashes for any reason, re-running the pipeline re-executes publish and you publish twice. Stages that send notifications, write to a database, or create external artifacts should be wrapped in an idempotency key or a “check then act” pattern (look up by id; if it exists, return it; if not, create it). Pipelines that aren’t idempotent are still useful, but their retry policy should be abort, not retry.

Second, the smell that says you’ve outgrown a pipeline: when you start adding conditionals between stages to skip or branch (“only run publish if the grammar check passed and the metadata included a category”), you’re reaching for a supervisor pattern in pipeline clothing. At that point the linear-chain abstraction is fighting you. Decompose into separate pipelines or move to a supervisor.

The trace array gives you observability into the pipeline execution. When something goes wrong (and it will), you can see which stage failed, what its input reference was, and how long each stage took. Note that the trace stores inputRef (a small description) rather than the full input, so a 50KB payload doesn’t get duplicated into trace memory at every stage. For more, see observability for agents.

Pattern 4: parallel execution with result aggregation

This pattern runs multiple independent tools simultaneously and combines their results. It’s similar to fan-out/fan-in, but the tools are different from each other rather than the same tool applied to different inputs.

When to use it: you need information from several independent sources before making a decision.

interface ParallelToolSpec {
  name: string;
  tool: (signal: AbortSignal) => Promise<unknown>;
  required: boolean; // Must this succeed for the aggregation to proceed?
  timeoutMs?: number;
}

interface AggregatedResults {
  results: Record<string, unknown>;
  errors: Record<string, string>;
  timing: Record<string, number>;
}

async function executeParallel(
  specs: ParallelToolSpec[],
): Promise<AggregatedResults> {
  const output: AggregatedResults = {
    results: {},
    errors: {},
    timing: {},
  };

  const executions = specs.map(async (spec) => {
    const start = Date.now();
    const controller = new AbortController();
    const timer = setTimeout(
      () => controller.abort(new Error(`Timeout after ${spec.timeoutMs}ms`)),
      spec.timeoutMs ?? 30_000,
    );
    try {
      const result = await spec.tool(controller.signal);
      output.results[spec.name] = result;
    } catch (error) {
      output.errors[spec.name] = String(error);
      if (spec.required) {
        throw new Error(`Required tool "${spec.name}" failed: ${error}`);
      }
    } finally {
      clearTimeout(timer);
      output.timing[spec.name] = Date.now() - start;
    }
  });

  await Promise.all(executions);
  return output;
}

Note the AbortSignal plumbing. A naive Promise.race(promise, timeoutPromise) cancels nothing on timeout: the slow tool keeps running in the background, holding open whatever resources it acquired (sockets, file handles, API quota) until it eventually completes or fails. For real cancellation, the underlying tool needs to accept the signal and short-circuit. Most modern HTTP clients (fetch, axios, the Anthropic SDK) accept an AbortSignal; tools wrapping them should pass it through.

A meeting-prep scenario where you gather context from multiple sources at once:

const meetingContext = await executeParallel([
  {
    name: "attendee_info",
    tool: (signal) =>
      invoke("lookup_contacts", { emails: attendeeList, signal }),
    required: false,
  },
  {
    name: "previous_notes",
    tool: (signal) => invoke("search_notes", { query: meetingTopic, signal }),
    required: false,
  },
  {
    name: "calendar_context",
    tool: (signal) =>
      invoke("get_recent_meetings", { with: attendeeList, signal }),
    required: false,
  },
  {
    name: "relevant_docs",
    tool: (signal) =>
      invoke("search_documents", {
        query: meetingTopic,
        limit: 5,
        signal,
      }),
    required: false,
  },
]);

// Aggregate: combine whatever succeeded into a briefing
const briefing = await invoke("generate_briefing", {
  attendees: meetingContext.results.attendee_info ?? [],
  pastMeetings: meetingContext.results.previous_notes ?? [],
  calendarHistory: meetingContext.results.calendar_context ?? [],
  documents: meetingContext.results.relevant_docs ?? [],
  errors: meetingContext.errors,
});

Notice that every tool here is marked required: false. For a meeting-prep briefing, partial information is better than no information. If the document search times out, you still get attendee info and past notes. The aggregation step (the LLM call to generate_briefing) handles incomplete data gracefully because it knows which sources failed.

When to use each pattern

Situation	Pattern	Why
Same operation on many items	Fan-out/fan-in	Maximizes parallelism for uniform work
Execution path depends on results	Supervisor agent	LLM decides the next step dynamically
Known sequence of transformations	Pipeline	Simple, predictable, easy to debug
Independent data gathering	Parallel execution	Minimizes total latency

In practice, real orchestration often combines patterns. A supervisor agent might use fan-out/fan-in for one of its steps. A pipeline stage might run parallel tools internally. The patterns compose, just like the tools they coordinate.

Start with pipelines. They’re the easiest to understand, test, and debug. Move to parallel execution when latency matters. Use fan-out/fan-in when you’re processing collections. Reach for supervisor agents only when you genuinely can’t determine the execution path in advance. Pattern selection isn’t aesthetic; the cost is your debug-on-Friday-afternoon time when something goes wrong, and pipelines are dramatically easier to reason about than supervisors.

For more on designing the tools and skills that work well within these patterns, see skill design principles, error handling, multi-step workflows, and observability for agents.