
There has been a lot of hype around AI agents recently. That is somewhat surprising because AI agents have been around for a while. So what actually changed?
Turns out, if you cut through all the noise, the big shift came when Anthropic released their Agent SDK. This is a way for developers to build their own agents on top of the same technology that powers Claude Code—currently one of the best agents on the market.
Over the course of a week, someone built a custom AI agent using this SDK. The result was an agent capable of reading and sending iMessages, searching through Slack and Gmail, and pulling data like reviews or build statuses from App Store Connect. It was used daily and felt incredibly personalized.
However, releasing it for others to use proved difficult due to some major limitations. This post will walk through everything learned about the Agent SDK while building with it: what it is, how it differs from other frameworks, how to structure projects, and where this technology actually makes sense today.
What Is an AI Agent?
If the word “agent” sounds complicated, it is actually simpler than it seems. An agent is really just three things:
- An LLM (like Claude Sonnet or GPT)
- A set of tools that the LLM can control
- A loop where the agent executes tools and keeps going until the task is complete
Here is how the flow works visually: a task is given to the AI. It thinks about what tool to use, executes that tool, looks at the results, and asks itself: “Am I done here? Is there anything else I need?” It keeps repeating this loop, using more tools, until it decides the task is finished.
This is what an agent looks like under the hood—literally a while loop. The LLM gets called, it decides on a tool, executes it, adds the results back, and loops again.
The Old Way: Building from Scratch
Even with a simple example, there is a lot of boilerplate code involved. The developer has to manage conversation history, execute tool calls, and deal with all the edge cases that come with a loop like this.
To simplify things, libraries emerged. The most popular one has been Vercel’s AI SDK. Instead of writing a custom while loop, the AI SDK handles the loop, tool calls, message history, streaming—all of that is managed under the hood. The only thing left to do is define the tools and set a maximum number of steps.
This approach works. But then Anthropic released their own SDK, and things changed.
What Makes Anthropic’s Agent SDK Different?
On the surface, both SDKs let developers create agents. But the way they go about it is completely different. Anthropic’s Agent SDK gives developers the same architecture and some of the same tools that come with Claude Code. And Claude Code is widely considered one of the most powerful agents available.
Here are the key differences:
1. Conversation Management
With most frameworks like the AI SDK, conversation history has to be manually passed around using a messages array. With the Agent SDK, it is much simpler. A session ID is provided, and the SDK manages all the context automatically.
2. Automatic Compaction
When conversations get extremely long, the SDK automatically summarizes earlier parts to save on tokens. This is the same feature Claude Code uses. It is a big deal for long-running conversations, and it requires zero configuration.
3. Built-in Tools
Developers get access to the same exact tools that Claude Code uses:
- Bash for running commands
- Read and edit for files
- Grep and Glob for searching
- Web search and web scraping tools (which are exceptionally well-optimized)
Because Claude Code only has a few tools, Anthropic was able to focus on making them very, very good. They come out of the box with the SDK.
4. Subscription Integration
Anthropic started allowing people to use their Claude Code subscription with the Agent SDK. This means that if someone builds something on top of the SDK, instead of paying every time tokens are consumed, the cost comes out of the existing Claude Code subscription.
At the time of writing, the subscription is severely underpriced. Someone calculated that the $200 per month Claude Code plan gives about $2,000 worth of API tokens each month. This is a huge deal for anyone wanting to experiment and build custom agents for personal use.
Important Caveats
There are some things to keep in mind when diving in.
Even though the SDK handles sessions and conversation context, conversations still need to be stored somewhere if they need to be rendered on a page. Sessions only live for about 30 days, so it is much better to persist them in a database.
Memory systems also need to be built manually. The SDK does have a memory command, but it is in beta and fairly basic. For this project, a custom memory system was built instead.
Understanding Tools and MCP
Tools are arguably the most powerful part of agents because they let agents actually do things. In the Agent SDK, tools work through something called MCP—Model Context Protocol. This is a way for an agent to communicate and interact with an external service.
Here is what a tool definition looks like:
- A tool name
- A description (extremely important—this is how the agent decides which tool to use)
- A Zod schema (the format for the inputs)
- A handler function that runs when the agent calls it
These get bundled into an MCP server and passed into the query.
On top of custom tools, the SDK provides built-in tools like bash, file operations, search, and web search. They are added to an allowed tools array by name, and the SDK handles the execution loop, conversation state, and parsing of results automatically.
Skills: Progressive Disclosure
Aside from tools, the Agent SDK introduces something called “skills.” Skills are a way to give specific capabilities to an agent without bloating the context. They can be loaded on demand.
In the Agent SDK, skills are not code objects. They are organized as folders in the file system with a specific format. Each skill has its own folder containing a skill.md file with two parts:
- A YAML formatter for metadata
- Markdown content with detailed instructions
Skills are auto-discovered. When the agent starts up, the SDK scans the folder list and reads the name and descriptions from each skill file. These descriptions are added to the system prompt. When a user asks for something like “check my Slack for urgent messages,” the agent looks at these descriptions, recognizes the Slack integration as relevant, and loads that instruction into the context.
This system is called “progressive disclosure.” Dozens of skills can be installed without heavily affecting the context window, because only the relevant ones get loaded when needed.
A quick heads-up: the skill tool has to be explicitly passed into the allowed tools array, and the sources and cwd parameters need to be configured so the SDK knows where to look for those skill files.
Building a Memory System
Memory is where a lot of interesting architectural decisions come in. There are a thousand ways to architect memory for an agent. After some trial and error, a basic system was built using three main types of memory stored in Convex:
1. Session Memory
The current conversation context. This gets loaded every time a new chat or conversation starts.
2. Persistent Memory
Important facts, preferences, and project details. Things like “builds productivity apps” or “has a dog named Luna.”
3. Archival Memory
Reference material and large amounts of data—full YouTube scripts, notes on creators studied in the past, past project documentation. This does not get loaded into every conversation but can be referenced when needed.
The key to making this memory system powerful is ensuring the agent proactively uses it and keeps it updated. A skills file was created with explicit instructions:
- Use memory tools proactively to save critical information without being asked
- Save a memory whenever a correction is given
This combination means the agent does not need to be explicitly told “remember this.” It automatically recognizes important information and saves it.
Most people parse conversation history and do post-processing to save critical information. But having the agent do it directly is far more effective. This memory layer gives the agent the ability to remember important pieces of information and critical feedback, making it feel incredibly personalized.
The Brutal Reality: Why Release Is Hard
After building this custom agent, the hope was to release it for others to use. But there are serious limitations.
Cost
The Agent SDK has a way to see the cost of each message. After tracking costs over several days, a pattern emerged. Large messages requiring multiple tool calls—like asking the agent to find critical messages needing a response—cost around $2 to $3 in some cases. Normal messages usually cost between $0.07 and $0.30.
With average usage, estimated costs land somewhere between $200 and $400 per month. Luckily, the Claude Code subscription covers that. But if the agent were released to others, the subscription could not be used. API pricing would apply.
If people used it half as much, costs might be around $100 per user. To do this safely, a subscription might need to be priced around $200 per month. Compared to ChatGPT and Claude at $20 per month (with free plans available), it is hard to imagine people paying that much for a custom tool.
From a consumer standpoint, this does not make sense right now while models cost what they do.
Local Access Limitations
A lot of the cool functionality came from local access to a machine. The iMessage integration worked because it was accessing the SQL database on a Mac to read and send messages. This does not work outside of that Mac. When deployed to a server, iMessage access is lost. And iMessage, along with other custom tools, was part of what made the agent special. Moving it to a server removes some of the magic.
Authentication Complexity
Some tools required API keys and other sensitive credentials to work. If deployed to other people, those same methods cannot be used safely. Some open-source wrappers ask users to feed in API keys and environment variables to power integrations. Even though it technically works, storing or handling that kind of data is not a comfortable position to be in.
Where This Technology Actually Makes Sense
While consumer release is challenging, the B2B world is a perfect place to deploy custom agents.
Here is an example: at a software agency that works with medical clients, HIPAA compliance audits are required. This is a very manual process. Using the Agent SDK, a custom HIPAA compliance agent was built that can go through codebases and AWS configurations to do a first pass and identify issues.
A single report might cost $50 to $100 to generate. That seems like a lot. But factoring in the $5,000 to $10,000 worth of engineering time saved through that first pass, $100 is totally worth it.
This is where custom agents shine: very specific workflows for business use cases where the ROI is clear. If a business can save thousands of dollars, they will happily pay for the API tokens.
Deployment Considerations
If an agent runs on a local computer, it stops working when that computer is off. Deployment solves this.
One approach is spinning up a VPS (virtual private server)—a little sandbox where the agent can write files, access programs and tools, and stay on 24/7. Using a service like Hetzner (which is cheap), a Claude Code subscription can be logged into on that machine. Code gets deployed there, and the agent has access to almost everything except Mac-only applications like iMessage.
A front-end toggle can even let the user choose between connecting to the VPS or the local machine. When local access is possible (for iMessage and Mac tools), that route is preferred. When the local machine is off, traffic points to the VPS so the agent keeps running.
This is also why a database like Convex is essential—to manage conversation history and memory so both instances can access the same data.
What’s Next
This only scratches the surface of the Agent SDK. There are also features like sub-agents, streaming, and computer use (where the agent can control a computer directly). By the time this is being read, there are probably even more features released.
The goal for now is to provide a solid foundation. Custom AI agents are powerful, and the technology is moving fast. For businesses with clear ROI, they are already worth building. For consumers, it might take a little longer for costs to come down.
But one thing is certain: the way agents are built today is not the way they will be built a year from now. And that is exactly what makes this space so exciting.