Technical design principles architecture

Skill Design Principles

Foundational principles for designing agent skills that are composable, predictable, and maintainable, including single responsibility, idempotency, progressive disclosure, and when to split versus merge.

Building a single skill that works is one challenge. Building a set of skills that work well together is a different challenge entirely. Individual skills need to be correct, but a skill ecosystem needs to be coherent. Each skill should have a clear role, compose naturally with others, and avoid surprising side effects.

This guide covers the design principles that govern how skills relate to each other and how they behave when agents chain multiple skills together to accomplish complex tasks.

Single responsibility: one job, done well

The single responsibility principle, borrowed from software engineering, is the most important constraint in skill design. A skill should do one thing and do it completely.

Why this matters for agents

When an agent plans a multi-step task, it selects skills based on their descriptions. If a skill does three things, the agent has to reason about whether it needs all three, some of them, or just one. This cognitive overhead leads to misuse: the agent either avoids the skill because it seems too heavy, or uses it for the wrong reason.

Consider a skill called manage_user:

// Bad: does too many things
const manageUserSkill = {
  name: "manage_user",
  description: "Create, update, delete, or look up users",
  parameters: {
    action: { type: "string", enum: ["create", "update", "delete", "get"] },
    userId: { type: "string" },
    userData: { type: "object" },
  },
};

This forces the agent to understand a branching interface where different parameters are required depending on the action value. The userId is required for update/delete/get but not for create. The userData is required for create/update but not for get/delete. This complexity is invisible in the parameter schema.

Compare this with four focused skills:

const createUser = {
  name: "create_user",
  description:
    "Create a new user account. Use this when the user asks to " +
    "register, sign up, or add a new team member. Returns the new user's " +
    "ID and profile. Fails if email is already registered.",
  parameters: {
    properties: {
      name: { type: "string", description: "Full name" },
      email: { type: "string", description: "Email address (must be unique)" },
      role: { type: "string", enum: ["admin", "member", "viewer"] },
    },
    required: ["name", "email", "role"],
  },
};

const getUser = {
  name: "get_user",
  description:
    "Look up a user by ID or email. Use this to check if a user " +
    "exists or to retrieve their profile details before modifying their account.",
  parameters: {
    properties: {
      userId: { type: "string", description: "User ID or email address" },
    },
    required: ["userId"],
  },
};

// ... similarly for update_user and delete_user

Each skill has a focused description, parameters that are always required (no conditional logic), and a predictable return type.

The test: can you describe it without “and”?

If your skill description naturally includes the word “and” connecting two unrelated capabilities, you probably need two skills:

  • “Search for files and replace content” -> split into search_files and replace_in_file
  • “Read a config file and validate its schema” -> split into read_file and validate_config
  • “Query the database and export results to CSV” -> split into query_database and export_to_csv

The exception is when the “and” connects two parts of a single atomic operation, like “compress and upload a file” where doing one without the other would leave the system in a broken state.

Composability over completeness

It’s tempting to build skills that handle entire workflows end-to-end. Resist this. Small, composable skills that agents can chain together are far more powerful and flexible than monolithic ones.

The power of composition

Imagine you need a workflow that:

  1. Finds all TypeScript files modified today
  2. Runs a linter on each file
  3. Summarizes the linting results

Monolithic approach, one skill that does all three:

// Inflexible: what if you want to lint a specific file?
// What if you want to find files but not lint them?
const lintRecentFiles = {
  name: "lint_recent_typescript",
  description: "Find recently modified TypeScript files and lint them",
};

Composable approach, three skills the agent chains:

const searchFiles = { name: "search_files" /* ... */ };
const lintFile = { name: "lint_file" /* ... */ };
const summarizeResults = { name: "summarize_lint_results" /* ... */ };

The composable approach lets the agent:

  • Search for Python files and lint them (reuses search_files + lint_file)
  • Lint a single known file (uses lint_file directly)
  • Search for files modified in the last week (uses search_files with different parameters)
  • Summarize results from any source, not just linting (uses summarize_lint_results)

Each skill is useful on its own and becomes more valuable when combined with others.

Designing for composition

Skills compose well when they follow these patterns:

  1. Accept identifiers, not dependencies. A skill should accept a file path, user ID, or query string, not a complex object from another skill’s output. This keeps skills loosely coupled.

  2. Return data, not rendered output. Return structured JSON that other skills (or the agent) can process further. Don’t return HTML, formatted text, or other presentation-layer output unless that’s the skill’s specific purpose.

  3. Be stateless when possible. A skill that depends on previous invocations is harder to compose because the agent has to manage invocation order. If state is unavoidable, make it explicit (e.g., accept a session_id parameter).

// Good: stateless, accepts an identifier
const getOrderDetails = {
  name: "get_order",
  parameters: {
    properties: {
      orderId: { type: "string", description: "The order ID to look up" },
    },
    required: ["orderId"],
  },
};

// Bad: stateful, depends on a previous "select_customer" call
const getCustomerOrders = {
  name: "get_orders",
  description:
    "Get orders for the currently selected customer. " +
    "You must call select_customer first.",
};

Idempotency and side-effect management

An idempotent operation produces the same result whether you execute it once or ten times. This property matters a lot for agent skills because agents frequently retry operations, sometimes because an error occurred, sometimes because they lost track of what they already did.

Read operations are naturally idempotent

Skills that only read data are inherently safe to retry. Searching files, querying databases, and reading configurations will return the same results (assuming the underlying data hasn’t changed).

Write operations need careful design

Skills that create, modify, or delete data need to be designed with retries in mind:

// Not idempotent: calling twice creates two notes
async function createNote(title: string, body: string) {
  return await db.notes.insert({ title, body });
}

// Idempotent: calling twice returns the same note
async function ensureNote(title: string, body: string) {
  const existing = await db.notes.findByTitle(title);
  if (existing) {
    return { note: existing, created: false };
  }
  const note = await db.notes.insert({ title, body });
  return { note, created: true };
}

The created field in the response tells the agent whether this was a new creation or an existing match, so it can adjust its behavior accordingly.

The side-effect spectrum

Not all side effects are equal. Classify your skill’s effects and communicate them clearly:

CategoryDescriptionAgent impactExample
Pure readNo side effects at allSafe to call freelysearch_files, get_user
ObservableReads data but logs accessSafe but leaves tracesquery_database (with audit log)
ReversibleModifies data but can be undoneAgent should confirm before callingupdate_note, rename_file
IrreversibleCannot be undoneAgent should strongly confirmdelete_user, send_email

For irreversible skills, consider adding a dry_run parameter:

const sendEmailSkill = {
  name: "send_email",
  parameters: {
    properties: {
      to: { type: "string" },
      subject: { type: "string" },
      body: { type: "string" },
      dry_run: {
        type: "boolean",
        default: false,
        description:
          "If true, validates the email and returns a " +
          "preview without sending. Use this to verify the email " +
          "looks correct before committing to send.",
      },
    },
  },
};

Progressive disclosure of complexity

A skill should be easy to use for the common case and progressively reveal complexity for advanced cases. The simplest valid invocation should handle the most common scenario, with optional parameters available for fine-tuning.

Layer your parameters

const searchSkill = {
  name: "search_codebase",
  parameters: {
    properties: {
      // Layer 1: Required -- the common case
      query: {
        type: "string",
        description: "Search term or regex pattern",
      },

      // Layer 2: Optional refinements
      path: {
        type: "string",
        description: "Directory to search in. Defaults to project root.",
      },
      file_type: {
        type: "string",
        description: "Limit to a file type: 'js', 'py', 'rust', etc.",
      },

      // Layer 3: Advanced tuning
      max_results: {
        type: "number",
        default: 50,
        description: "Maximum results to return. Default 50, max 500.",
      },
      case_sensitive: {
        type: "boolean",
        default: false,
        description: "Enable case-sensitive matching.",
      },
      include_hidden: {
        type: "boolean",
        default: false,
        description: "Include files in hidden directories (.git, .config).",
      },
    },
    required: ["query"],
  },
};

The agent can call this skill with just { query: "handleSubmit" } for the common case, or add { query: "handleSubmit", file_type: "ts", case_sensitive: true } when it needs precision. The skill works well at every level of detail.

Don’t require knowledge the agent may not have

If a parameter requires domain-specific knowledge to fill in correctly, either make it optional with a good default, or provide guidance in the description:

// Bad: requires the agent to know valid index names
{
  index_name: {
    type: "string",
    description: "The Elasticsearch index to query"
  }
}

// Good: provides the information the agent needs
{
  index_name: {
    type: "string",
    description: "The Elasticsearch index to query. Available indexes: " +
      "'logs-app' (application logs), 'logs-system' (system logs), " +
      "'metrics-*' (time-series metrics). Defaults to 'logs-app'.",
    default: "logs-app"
  }
}

When to split vs. merge skills

This is the question that comes up most often in practice. Here are some concrete heuristics:

Split when

  • The skill has conditional parameters, where different parameters are required depending on a mode/action flag
  • The skill has multiple unrelated failure modes, and errors from one path confuse recovery for another
  • The description requires “or”, like “Search files or search file contents” (should be two skills)
  • Different use cases need different permissions, since reading data vs. deleting data shouldn’t share a permission surface

Merge when

  • The operations are always done together. Compressing and uploading is one logical action.
  • Splitting would create chatty round-trips. If the agent would always call skill A then immediately call skill B with A’s output, consider merging them.
  • The operations share expensive setup. If both need to authenticate with an external API, merging avoids double authentication.
  • The total skill count is becoming unmanageable. If an agent has 200 skills, it struggles to select the right one. Sometimes merging related skills improves selection accuracy.

The sweet spot

Most well-designed agent skill sets land between 10 and 40 skills. Fewer than 10 usually means skills are too broad. More than 50 usually means skills are too granular and the agent spends too much time choosing.

A practical guideline: if you’re building a new skill, ask whether an existing skill could handle the use case with an additional optional parameter. If the parameter fits naturally, extend the existing skill. If it feels forced, create a new one.

Naming conventions

Consistent naming helps agents select skills quickly. Adopt a verb-noun pattern and stick with it across your entire skill set:

search_files       (not file_search or findFiles)
create_user        (not add_user or new_user)
get_order          (not fetch_order or read_order)
update_config      (not modify_config or change_config)
delete_record      (not remove_record or drop_record)
list_projects      (not get_all_projects or show_projects)
run_query          (not execute_query or do_query)

Pick one verb for each operation type and use it everywhere:

OperationVerbExamples
Read onegetget_user, get_order, get_file
Read manylist or searchlist_users, search_files
Createcreatecreate_note, create_project
Updateupdateupdate_config, update_profile
Deletedeletedelete_record, delete_file
Executerunrun_query, run_test, run_migration

Summary

These principles work together to create skill ecosystems where agents can reliably plan and execute multi-step tasks:

  1. Single responsibility means each skill does one thing, so the agent can reason clearly about what it needs
  2. Composability means small skills chain together into workflows no single skill could anticipate
  3. Idempotency means safe retries, so agents can recover from errors without causing damage
  4. Progressive disclosure means simple invocations for common cases, with optional depth for advanced ones
  5. Thoughtful splitting and merging gives you enough granularity for precision without so much that selection becomes noisy

With these principles in mind, head to Testing and Debugging Skills to learn how to verify that your skills behave correctly across the full range of inputs an agent might send.