<Technical Deep Dive> Agentic Design Patterns
I spent the last few weeks digging into the major approaches, reading through docs, watching talks, and actually trying to build with some of these tools. Heres what I found, and more importantly, what I think actually matters when you're deciding how to build agentic systems.
The Two Taxonomies That Actually Matter
Before we get into specific tools and frameworks, its worth understanding that there are basically two influential ways people have sliced up agentic design patterns. And they come from two pretty different perspectives.
Andrew Ng kicked things off in early 2024 with his four agentic design patterns, which he presented at a Sequoia Capital talk and then expanded in his DeepLearning.AI newsletter. His four patterns are: Reflection, Tool Use, Planning, and Multi-Agent Collaboration. The framing was simple and practical — he showed that wrapping GPT-3.5 in an agentic loop with these patterns could outperform GPT-4 used in zero-shot mode on coding benchmarks. That was a pretty striking result and it got alot of people's attention.
Then in late 2024, Anthropic published their "Building Effective Agents" blog post which laid out a different but overlapping taxonomy. They draw a sharp line between Workflows (where you the developer control the flow through predefined code paths) and Agents (where the LLM dynamicaly directs its own process). Under workflows they identifiy five specific patterns: Prompt Chaining, Routing, Parallelization, Orchestrator-Workers, and Evaluator-Optimizer. And then there's the fully autonomous agent pattern on top.
These two frameworks aren't contradictory — they're complementary. Ng's patterns describe what capabilities an agent needs (can it reflect? use tools? plan? collaborate?). Anthropic's patterns describe how you wire those capabilities together architecturaly. You kind of need both mental models to make good decisions.
The Core Patterns
Let me walk through the patterns that keep appearing across every framework I looked at. These are the recurring building blocks regardless of whether your using LangGraph, CrewAI, AutoGen, or building from scratch.
Prompt Chaining
This is the simplest agentic pattern and honestly the one you should try first before reaching for anything fancier. You decompose a task into a sequence of steps, where each LLM call processes the output of the previous one. You can add programmatic checks between steps to make sure things stay on track.
For example: generate marketing copy, then translate it. Or write a document outline, validate it meets criteria, then write the full document based on the outline. Its sequential, its predictable, and its surprisingly effective for alot of real-world tasks.
Anthropic's advice here is spot-on — this is ideal when the task can be cleanly decomposed into fixed subtasks and you're essentially trading latency for accuracy by making each individual LLM call simpler.
Routing
Routing is basically a classifier that looks at incoming input and sends it down different paths. Think customer support: general questions go one way, refund requests go another, technical issues get routed somewhere else entirely. Each path can have its own specialized prompt, tools, and even different models.
One thing I find underappreciated about routing is that it also lets you optimize costs. You can route easy questions to smaller, cheaper models like Claude Haiku and send only the complex stuff to the heavy hitters like Claude Sonnet or GPT-4. Thats a real money saver at scale.
Reflection
This is Andrew Ng's first pattern and its maybe the most powerful for its simplicity. The LLM examines its own output and uses that feedback to improve it. It sounds almost too simple to work but the results speak for themselves.
In practice, this shows up as the Evaluator-Optimizer pattern in Anthropic's taxonomy — one LLM generates output, another (or the same one with different instructions) evaluates it, and the cycle repeats until quality is good enough. Think of iterative writing where a human writes a draft then reviews and revises it. Same idea, but automated.
Parallelization
Sometimes you need multiple LLM calls running at the same time. Anthropic identifies two flavors here: Sectioning (breaking a task into independent subtasks that run in paralell) and Voting (running the same task multiple times to get diverse outputs for higher confidence).
A practical example of sectioning: having one model instance handle user queries while another simultaneously screens for inappropriate content. For voting: having multiple prompts independently review code for vulnerabilities and flagging anything that gets caught by any of them.
Orchestrator-Workers
This is where things start getting properly "agentic." A central LLM dynamically breaks down tasks, delegates subtasks to worker LLMs, and synthesizes their results. The key difference from parallelization is flexibility — the subtasks aren't predefined, the orchestrator figures them out based on the input.
This pattern maps closely to what Andrew Ng calls "Planning." The orchestrator has to plan what needs to be done and then coordinate the execution. Its the backbone of most coding agents — the number and nature of files that need changing depends entirely on the task, so you cant hardcode the workflow ahead of time.
Multi-Agent Collaboration
Ng's fourth pattern and arguably the one generating the most hype right now. Multiple AI agents work together, each with specialized roles, splitting up tasks and sometimes even debating ideas. This is the pattern that all the multi-agent frameworks are really built around.
The key insight is that AI models tend to work better with focused tasks. A model loaded with 50 different tools gets confused and performs worse than specialized agents each with a handful of tools doing what they do best.
Things Get Messy
Ok so thats the theory. Now lets look at what the actual tools and frameworks are doing with these patterns, because this is where it gets confusing fast.
OpenAI: From Swarm to the Agents SDK
OpenAI's original Swarm was a lightweight experimental framework focused on two primitives: agents and handoffs. An agent encapsulates instructions and tools, and can hand off conversation to another agent. Dead simple. Stateless between calls, built entirely on the Chat Completions API.
But heres the thing — Swarm is now deprecated. OpenAI replaced it with the Agents SDK, which is a production-ready evolution. The SDK keeps the same core mental model (agents, handoffs) but adds guardrails, sessions for conversation history, built-in tracing, and its now provider-agnostic — it supports over 100 LLMs, not just OpenAI models.
The Agents SDK runs a loop: call the LLM, if theres tool calls execute them, if theres a handoff switch agents, repeat until you get a final output. Its elegant and if you're building basic agent routing or customer service workflows its probably the quickest path to something working.
The limitation is still there though — this is fundamentally a sequential handoff model. One agent at a time, passing the baton. You can build complex networks of agents but they're not running in parallel by default.
LangGraph
LangGraph takes a fundamentally different approach. Instead of the handoff metaphor, you build agents as graphs — nodes are computational steps (which can contain LLMs or just regular code), edges define the flow. Its inspired by Google's Pregel system and operates in "super-steps" where nodes that can run in parallel do so automatically within the same super-step.
The state management in LangGraph is where it really shines. You define a typed state schema, specify reducer functions for how updates get merged, and the framework handles the rest. You get checkpointing, human-in-the-loop breakpoints, and proper graph visualization out of the box.
LangGraph supports conditional edges (routing), parallel execution, the Send primitive for map-reduce patterns and dynamic fan-out, and the Command object for combining state updates with routing decisions in a single node. Its powerful, but the learning curve is steeper than something like the OpenAI Agents SDK.
For multi-agent specifically, LangGraph gives you two main patterns: a collaboration mode where all agents share message history (good for tight collaboration, gets noisy), and a supervisor/hierarchical mode where a boss agent delegates to workers with private state. You can also nest subgraphs and navigate between them, wich gives you composability.
CrewAI
CrewAI takes the multi-agent metaphor literally. You compose "crews" of agents, where each agent has tools, memory, knowledge, and can produce structured outputs. Crews execute through defined processes — sequential, hierarchical, or hybrid.
What sets CrewAI apart is its focus on enterprise-readiness. It has a full deployment story with environment management, triggers that connect to Gmail, Slack, Salesforce, and more, team management with RBAC, and an Enterprise console for monitoring live runs. If your building business process automation and you want something opinionated that handles the deployment side too, CrewAI is worth a serious look.
The tradeoff is that you're buying into their abstractions pretty heavily. The "Flows" system for orchestrating steps with start/listen/router patterns and state management is powerful but its another thing to learn on top of the agent/crew/task model.
Microsoft AutoGen
AutoGen is Microsoft's entry and its, well, big. Its split into three layers: Core (an event-driven runtime for scalable multi-agent systems), AgentChat (a higher-level programming framework for conversational agents), and Studio (a web UI for prototyping without code).
The event-driven Core layer is interesting if you're building distributed agent systems — it supports multi-language applications through its gRPC runtime. The AgentChat layer is more approachable and is where most people will start, with patterns for both deterministic and dynamic workflows.
AutoGen also has an extensions ecosystem for plugging in MCP servers, Docker-based code execution, the OpenAI Assistants API, and more. Its probably the most enterprise-grade option in terms of scalability, but it also feels the most "frameworky" — lots of abstractions, lots of configuration.
Anthropic's Approach: Patterns, Not Frameworks
Anthropic deliberately didn't ship a framework with their blog post (though they later released the Claude Agent SDK). Their philosophy is: start simple, use LLM APIs directly, and only add complexity when it demonstrably improves outcomes. Many patterns can be implemented in a few lines of code.
The Claude Agent SDK and the MCP protocol are their concrete tools, but the real contribution is the taxonomy itself and the hard-won advice from working with dozens of teams. Things like: design your agent-computer interface (ACI) as carefully as you'd design a human-computer interface. Test tools extensively. Make sure the model can actually use your tools based on descriptions alone — if it would confuse a junior developer reading the docstring, it'll confuse the model too.
So which one?
After going through all of this, here's my honest take on decision-making.
If your task can be solved with prompt chaining or a single well-prompted LLM, do that. Seriously. The Anthropic team emphasizes this and they're right — agents trade latency and cost for better task performance, and you should only make that trade when you need to.
If you need basic agent routing (like customer service triage), the OpenAI Agents SDK is probably the fastest path. Its simple, well-documented, and the handoff model maps naturally to that use case.
If you need fine-grained control over state, parallel execution, human-in-the-loop, and you don't mind a steeper learning curve, LangGraph gives you the most power. Its the "I want to build exactly the agent system I have in my head" option.
If you're building business automation and want an opinionated, deployment-ready platform, CrewAI has the best story there. Triggers, monitoring, RBAC — its all built in.
If you need distributed, scalable multi-agent systems and your in a Microsoft ecosystem, AutoGen makes alot of sense. The event-driven core is genuinely different from what everyone else is offering.
And if you want maximum flexibility with minimal framework lock-in, take Anthropic's patterns, implement them yourself against the raw API, and add layers of abstraction only as you discover you need them.
The costs, literally
One thing I want to flag explicitly because I don't think it gets enough attention: multi-agent systems are expensive. Every sub-agent is burning tokens. Every coordination message between agents is burning tokens. Every reflection loop, every evaluation cycle, every parallel worker — tokens, tokens, tokens.
I've seen teams get excited about a five-agent system that produces beautiful results and then realize their per-request cost went from $0.02 to $0.50. Thats a 25x increase. At scale, that adds up fast.
Before you go multi-agent, do the math. Can you get 80% of the result with a single agent and good prompt engineering? Often the answer is yes, and you save a ton of money and latency in the process.
Summary
The agentic design pattern space is maturing quickly but its still messy. The good news is that the underlying patterns are actually pretty stable — reflection, tool use, planning, routing, parallelization, orchestration, and multi-agent collaboration. These patterns predate any specific framework and they'll outlast them too.
In our view: learn the patterns first, then pick the framework that fits your constraints. Don't let a flashy demo drive your architecture decisions. The right choice depends on your specific use case, your team's expertise, your cost tolerance, and how much control you need over the orchestration.
Start simple. Add complexity only when you have evidence its needed. And always, always measure the cost.