AI Agent Development: Simplicity Over Complexity

Introduction to AI Agents and Workflows

The concept of an AI agent varies widely. Some view them as 'all-capable butlers' that can independently think, make decisions, and use tools to complete complex tasks. Others see them as 'rule-following employees' that execute preset workflows. Anthropic refers to both as intelligent systems, distinguishing between workflows and agents as follows:

Workflows: Systems that orchestrate Large Language Models (LLMs) and tools through predetermined code paths.
Agents: Systems where an LLM dynamically guides its own processes and tool usage, autonomously controlling how tasks are completed.

When to Choose Agents vs. Workflows

Anthropic advises developers to adopt the principle of 'simplicity trumps complexity' when developing AI applications. Not every scenario requires a complex intelligent system. While powerful, these systems can lead to slower responses and increased costs. Developers must balance functionality and efficiency.

Workflows are suitable for well-defined tasks needing predictability and consistency.
Agents are more appropriate for large-scale scenarios requiring flexibility and model-driven decisions.

For many applications, using well-crafted prompts with retrieval and context examples directly with a large model is sufficient.

The Use of Frameworks

Several frameworks assist developers in building AI agents, such as:

LangChain's LangGraph
Amazon Bedrock's AI Agent framework
Rivet, a drag-and-drop LLM workflow builder
Vellum, a GUI tool for building and testing complex workflows

These frameworks simplify the development process but can also add layers of abstraction, making the underlying logic less transparent, increasing debugging difficulty, and potentially introducing overly complex solutions for simple scenarios.

Anthropic recommends starting with direct use of the large model's API. Many patterns can be implemented with just a few lines of code. If frameworks are chosen, it is crucial to understand their underlying principles. Insufficient understanding of the framework's mechanics is a primary cause of development problems. Anthropic's cookbook provides specific examples.

Building Blocks, Workflows, and Agents

Foundation: Enhanced LLMs

The fundamental building block of intelligent systems is an enhanced LLM with capabilities such as retrieval and memory. Anthropic models can proactively use these features, for example, generating search queries, selecting tools, and deciding what information to retain.

When extending functionality, focus on:

Customizing features based on specific application scenarios.
Ensuring simple and well-documented interfaces for the model.

Anthropic's recent model context protocol simplifies the integration of AI models with third-party tool ecosystems.

Workflows: Prompt Chaining

Prompt chaining breaks complex tasks into multiple steps, with each step invoking an LLM. Subsequent steps build on the results of the previous one. Developers can add checkpoints to ensure the process proceeds as intended.

Prompt chains are suitable for tasks that can be clearly decomposed into a series of fixed subtasks. While the overall response time may be longer, accuracy can be significantly improved because each model focuses on a simple task.

Typical use cases include:

Generating marketing copy and then translating it into other languages.
Writing a document outline, conducting compliance checks, and then writing the complete document based on the outline.

Workflows: Intelligent Routing

Routing technology determines the type of input task and assigns it to the appropriate module. This design optimizes each module for specific tasks, preventing interference between different task types. Intelligent routing is suitable for scenarios with distinct task categorization. AI systems can accurately identify task types and route them using LLMs or traditional algorithms.

Typical use cases include:

In customer service systems, directing general inquiries, refund requests, and technical support issues to their respective processes.
Assigning simple common questions to smaller models and complex, rare questions to more powerful models to optimize costs and speed.

Workflows: Parallel Processing

Large language models can process tasks simultaneously and aggregate outputs programmatically. Parallel workflows are characterized by:

Task Segmentation: Breaking tasks into subtasks that can run concurrently, with results integrated at the end.
Voting Mechanism: Running the same task multiple times, selecting the best result, or synthesizing multiple answers.

Parallel processing is highly effective when subtasks can be executed in parallel to increase speed or when multiple perspectives are needed for higher confidence. For complex tasks, having each call focus on a specific aspect yields better results.

Typical use cases include:

Task Segmentation:
- Security: One model processes the user request, and another conducts content moderation.
- Performance Evaluation: Different models assess system performance metrics.
Voting Mechanism:
- Code Security Checks: Multiple detection models collaborate to find code vulnerabilities.
- Content Moderation: Multiple models evaluate content safety from different angles.

Workflows: Leader-Executor

A central large language model dynamically breaks down tasks, assigns them to executor models, and summarizes the results.

This workflow is suitable for complex tasks where specific steps are difficult to determine in advance. Task decomposition is not fixed; it's dynamically decided by the AI system based on the situation.

Typical use cases include:

Programming applications requiring complex modifications to multiple files.
Search tasks requiring gathering and analyzing information from multiple sources.

Workflows: Evaluation-Optimization

One LLM call generates a response, and another provides evaluation and feedback, forming a cycle.

This workflow is particularly effective when clear evaluation criteria exist, and iterative refinement can provide significant value. The LLM can provide feedback, similar to a human writer's editing process.

Typical use cases include:

Literary translation: The evaluation model identifies missed language nuances in the translation and suggests revisions.
Complex searches: The evaluation model determines whether to continue a deeper search.

Agents

Agents have emerged as LLMs have matured in understanding complex inputs, reasoning, planning, tool usage, and error recovery.

An agent's work begins with a command or interactive discussion with a human user. Once the task is clear, the agent plans and operates independently, potentially needing more information from the user or requiring human judgment.

At each step of the execution process, obtaining 'ground truth' from the environment is critical. The agent can pause at checkpoints or when encountering roadblocks to get human feedback. Tasks usually terminate upon completion, but they often include stop conditions.

Agents can handle complex tasks, but their implementation is often simple, typically just a large language model using tools in a loop based on environmental feedback. Thus, designing the toolset and its documentation clearly and thoughtfully is critical.

Agents are suitable for open-ended problems where the number of steps needed is hard to predict, and fixed paths cannot be hardcoded. An agent's autonomy makes it ideal for scaling tasks in trusted environments. Agent autonomy means higher costs and the risk of error accumulation. Extensive testing in a sandbox environment and setting up appropriate safeguards are recommended.

Examples of agent applications include:

A code agent used to solve SWE-bench tasks involving editing multiple files based on task descriptions.
Anthropic's 'Computer use' functionality, where Claude uses a computer to complete tasks.

Combining and Customizing

These building blocks are not prescriptive; developers can shape and combine them based on use cases. The key to success is measuring performance and iterating. Complexity should only be added when simpler solutions are inadequate. Success in the LLM field lies not in building the most complex system but in building a system that meets the need. Start with simple prompts, optimize them with comprehensive evaluations, and only add multi-step agent systems when simpler solutions are not achievable.

When deploying agents, follow these principles:

Keep agent designs simple.
Prioritize agent transparency, clearly displaying each planned step.
Carefully craft the Agent-Computer Interface (ACI) through complete tool documentation and testing.