Tools Live On After Leaving Their Maker
Rethinking function calls while building MCP tools

In a previous post, I treated personal asset management as a kind of state management problem. Recently I wanted to manage that state better, so I started rebuilding the tools around it. This post is about the interface design questions that came up along the way.
Until then, I had been managing my assets with a combination of a Notion database and Google Apps Script. When I logged a transaction in Notion, GAS would pull exchange rates and stock prices to compute the current value. Looking back, it worked, but it was clearly too heavy a setup to live with long-term.
First, the input itself was too heavy. To leave behind a single fact like “I bought 15 shares of AAPL at $211 yesterday,” I had to open Notion, find the database, create a new row, and fill in several columns, basically jumping through hoops every time. Doing it once or twice is fine. But once those instances start to pile up, you eventually get tired of it and start putting it off.
Second, the operating cost was also too heavy. To answer a single question like “how did last March’s exchange rate movements affect my portfolio?”, I had to either build a new Notion view or attach another function to GAS. Each question essentially meant building one more tool. On top of that, GAS doesn’t really support webhooks, so triggering scripts from Notion was also a pain.
To cut down on this friction, this past weekend I moved the asset management setup over to a SQLite-based CLI plus an MCP server. Over two solid days of working through the weekend, I built around 50 MCP functions, plus a CLI for invoking them manually. I called the result Firma. I designed the CLI commands and MCP tools together from the start, because I felt every feature should work at a comparable level through both interfaces.
What was interesting is that building the two interfaces in parallel surfaced a strange thing. When I exposed the same function as a CLI command versus as an MCP tool, I expected them to behave similarly, but they actually played out quite differently. Same signature, same arguments, same return value. Even the message formats were nearly identical. The only thing that changed was whether the caller was a human or an LLM, and that difference turned out to be bigger than I thought.
This post is about that strange feeling. Wiring the same function up to both CLI and MCP brought to light an assumption I had quietly accepted in interface design. The reason MCP differs from RPC isn’t simply that it’s a new protocol. It’s that we have been treating function calls as the default for far too long.
Function Calls Are More Intimate Than We Think
There’s an assumption sitting underneath function calls that we usually don’t notice. It’s the assumption that the caller and the callee share a fair amount of context.
Take a moment to think about what happens when we call something like addTransaction(ticker, quantity, price). Just from the signature, we already have a rough sense of what it does. We trust that we more or less know why the author chose this shape, what side effects it has, what exceptions it might throw. Maybe not perfectly, but enough. Calling something without that understanding isn’t really a function call. It’s closer to guesswork.
The tools of the last few decades have evolved in the direction of strengthening this intimacy. Static type systems block you at compile time when you get an argument shape wrong. IDE autocomplete pulls back signatures you’ve forgotten. JSDoc and docstrings layer the author’s intent on top in plain text. RPC, gRPC, and GraphQL have moved toward stricter schemas, and they belong to the same lineage. The underlying assumption is always the same. The more the caller knows about the callee, the better the call.
I built my own CLI command for registering transactions on top of this same assumption. When the user types firma add txn, an interactive prompt asks for the ticker, then the type, then the shares and price one after another.
// In the CLI, this signature alone was enough
function addTransaction(input: {
ticker: string;
type: "buy" | "sell" | "deposit" | "dividend" | "tax";
shares: number;
price: number;
date: string;
}): Promise<Transaction>Here the user already knows what they want to do. The CLI just needs to capture that intent cleanly and hand it off to the function. The function name and argument types end up doing most of the work. I had taken this for granted for so long that I couldn’t even see it as an assumption. Not until I exposed the same function as an MCP tool.
What the Signature Doesn’t Say
The moment you expose the same function as an MCP tool, the caller shifts from human to LLM. And from that point on, all the things the signature wasn’t saying suddenly start surfacing as problems.
Me: "I bought 15 shares of AAPL yesterday"
LLM: Sure, I'll call the add_balance function.
Wait, where did that come from?
In Firma, alongside add_txn for recording transactions, there’s also add_balance for end-of-month asset snapshots, and add_flow for monthly income and expense entries.
Naturally, all the signatures are clear, and the argument types are strictly defined with zod schemas. Even so, the LLM kept getting confused about which tool to call. Running the same utterance through it a few times, it picked the wrong tool more often than I expected.
At first I thought maybe the LLM was just bad. But the longer I sat with it, the more I realized the problem was somewhere else. A signature explains what a function takes pretty well. What it barely explains is when it should be called. The name add_txn is just an identifier. It carries no notion of “use this tool when the user mentions a trade.”
A function name is more like a single line in a dictionary. Just as a word on its own can’t decide which sentence it belongs in, a signature on its own can’t decide which utterance should call it.
In the CLI, this wasn’t really an issue. The caller was a human, and the human had already typed firma add txn carrying the context “I’m about to record a transaction.” The responsibility for picking when to call sat in the caller’s head.
The LLM is different. To an LLM, the user’s utterance isn’t a finished command. It’s raw material from which intent has to be inferred. From a single word like “bought,” the LLM has to decide whether this is a buy transaction, a balance change, or a new cash flow. The only thing it has to lean on is how each tool describes itself.
Things only stabilized after I rewrote the description for add_txn. The actual production text has more operational detail attached, but for this post I’ll just paste the core of the change.
server.tool(
"add_txn",
`Records a single trade on a specific ticker: buy, sell, dividend received, or tax paid.
Use this when the user mentions buying or selling stocks, receiving dividends, or paying taxes on a specific security.
Do NOT use this for general cash deposits, salary, or expenses. Use add_flow for those.
Do NOT use this for monthly asset snapshots. Use add_balance for those.`,
schema,
handler
)What matters here is that the intent of the function migrated from the signature into the description. The signature still enforces the shape of arguments. But the meaning of the call is now decided much more by the description. The question of where to inscribe intent has shifted.
This shift happens because the caller doesn’t know the context. But when I sat with that for a while, I started to wonder. Is it really that strange for a caller not to know the context?
Tools Have Always Lived On Without Their Maker
Take a hammer for example. The person who first made the hammer surely had their own intent. Decisions about what weight is right, how long the handle should be, how the mass should be distributed. But the person who later picks the hammer up doesn’t know any of that background. They don’t need to know why the handle curves the way it does, or why the center of mass sits where it does. All they need is the intent to drive a nail.
And that’s enough. The moment a person sees a hammer, they catch the gist. Where to grip it, which side to swing, what kind of work the tool fits. They read it off the form. Tools, in this way, leave their maker and live their own lives. The user doesn’t need to know what was inside the maker’s head. They look at the form and guess whether it fits their intent. Most of the scenes in which we use tools actually look like this.
How a hammer gets used is decided by whoever is using it
This is a pretty interesting point. Whether the person holding the hammer drives a nail or breaks down a wall isn’t decided by the maker. It’s decided by the user. (They could even break a person rather than a wall…)
The blacksmith likely had a clear purpose in mind when shaping the hammer. But where that hammer actually gets swung is ultimately decided by the hand that holds it. Tools don’t just live on after leaving their maker. They live on past their maker’s intent.
Donald Norman’s concept of affordance points at this very thing. Affordance was originally coined by the psychologist James Gibson. It refers to what actions an object makes possible for a user. A chair affords sitting, a cup affords gripping, a handle affords turning. Norman pulled the concept into the language of design, narrowing it to “the property of an object’s form that suggests its own use.”
A well-designed door handle doesn’t need to spell out “grip here and pull.” The form alone should be enough to express how it should be used. The opposite is also true. Tools whose affordances are misaligned confuse their users at every turn.
Norman often points to doors that need to be pushed but have pulling-style handles. These mismatched doors are so common that the design world has even given them a nickname: Norman doors. The handle says “grip and pull,” while the actual action is “push.” A tool whose form lies to its user constantly offloads cognitive burden onto them.
The hammer is the same. Its curved handle and heavy head deliver “grip here and swing that way” at a near-unconscious level. A good tool hints at its use through its form before any documentation does.
From that angle, the description of an MCP function isn’t just documentation. To the LLM, it’s effectively the tool’s form. Just as a hammer’s handle says “grip here,” the description of a function says “call me when this kind of intent shows up.” A tool with a weak description is like a hammer with an awkward handle. It exists, but somehow your hand doesn’t reach for it.
I felt this most clearly when designing the read-side tools. At first I wanted to split them up cleanly along Single Responsibility lines. Tools with sharp roles like get_holdings, get_prices, get_pnl, calculate_value. From a function-design standpoint, it’s a pretty clean design.
But the picture changes when you put it next to a user asking “how’s my portfolio doing?” To answer that, the LLM has to chain four or five calls together. Each call might be correct, but the overall response gets slower, and there’s more room for things to slip in between. To a user asking with a single intent, you end up shoving too many small tools at them.
So I ended up making show_portfolio into a slightly heavier tool. It returns holdings, average cost, current price, market value, and cumulative P&L all in a single call. get_brief goes further. It returns holdings, daily P&L, concentration, movers, news, earnings, macro indicators, and insights all at once. From a pure RPC standpoint, this can look like an overstuffed design.
But from a tool-using standpoint, this actually feels more natural. SRP is a principle that matters to the person making functions. Affordance is a principle that matters to the person using tools. If the caller is human, assembling small units works fine. If the caller is an LLM, a tool that touches the intent more directly is better. Which is better isn’t an absolute question. It depends on who the caller is.
The Direction of Trust Also Flips
In terms of shape, MCP actually sits inside the RPC family of protocols. Its messages use the JSON-RPC format, and its vocabulary of signatures and argument schemas is borrowed straight from the RPC era. That’s why I initially assumed RPC and MCP would end up flowing in similar ways.
But the more I looked at the two protocols, the more I saw a decisive point where they diverge. On top of the center of gravity moving from signature to description, the direction of trust flips along with it.
In RPC, the caller typically trusts the callee. For instance, if the signature says it returns Promise<Transaction>, the caller naturally trusts that promise and writes code accordingly. The callee, in turn, doesn’t have to particularly trust how the caller will invoke it. The type system and schemas enforce the shape on the caller’s behalf. Call it wrong and you get a compile error or a runtime error.
With MCP, the story shifts a little. The maker of the tool has to extend some trust to the caller. They have to expect that the LLM will actually read the function’s description, and that it will call the tool at an appropriate moment with appropriate arguments. The signature only enforces the shape of the arguments. The timing and meaning of the call still belong to the realm of inference.
This difference shows up more clearly in write-side tools. Firma’s add_txn actually writes to a SQLite database. If it gets called incorrectly, the asset records get corrupted right away. That fear pushed me to write the description like this at first.
"Records a transaction. Always confirm with the user before recording."The result wasn’t great. The LLM asked “should I record it like this?” every single time, and I answered “yep” every single time. The whole point of a natural language interface basically vanished. Push too much responsibility back to the user, and the tool suddenly drags. It would be like a hammer asking “are you sure you want to drive this nail?” every time you picked it up.
So in the end, I rewrote the description like this.
"Record the transaction immediately when the user clearly states a trade.
Do not ask for confirmation unless the data is ambiguous (missing date, etc).
If you record incorrectly, the user can run delete_txn to undo."Rules like “record immediately,” “only ask when ambiguous,” and “if it goes wrong, it can be reversed” aren’t really formal descriptions of the tool. They’re closer to operating principles handed to the caller. And for those guidelines to actually work, you have to expect the LLM to do at least some reasoning.
You don’t make requests of a function. You enforce it with types. With a reasoning caller, on the other hand, you have no real choice but to make requests. I think this is also why the descriptions of MCP functions keep growing into something that reads more like a behavior policy.
In a way, a signature is a declaration and a description is a request. A signature nails down “this function takes these arguments.” A description, on the other hand, asks “please call me when this kind of intent shows up.” One is enforcement, the other is persuasion. The language of persuasion has started entering tool design.
Maybe Function Calls Were the Special Case All Along
Coming back to the original question, the fact that a caller doesn’t know a function’s context isn’t actually that strange.
Humans have always used tools they didn’t make themselves. Bowls made by a potter, knives made by a blacksmith, chairs made by a carpenter. Even programs written by developers fall into this. Expecting users to always use a program the way you intended them to is something that consistently misses the mark.
Because tools leave their maker and pass into other hands like this, the user doesn’t know what was inside the maker’s head. That’s why looking at the form of a tool and using it from there is the more common scene.
In that case, the function call we’ve treated as the default for so long was actually a fairly special kind of relationship. A relationship where the caller and the author share a lot of context, and where type systems, IDEs, and documentation continuously reinforce that intimacy. Rather than treating function calls as the universal mode of tool use, it’s probably more natural to see them as one unusually intimate form of it.
Now that the caller is no longer human, there are two paths to take. One is to feed more and more context into the LLM to make it as intimate a caller as possible. The other is to close the distance from the tool’s side. Inscribing the tool’s intended use into its description, so that even a caller without all the context can still use it.
MCP leans toward the latter. It feels like a fairly natural choice. The first path eventually demands that the caller becomes more and more knowledgeable. The second path accepts that a caller can never know everything to begin with. And it fills that gap through tool design. Looking back, the way humans have made tools throughout history has always been closer to that second path.
This might be closer to a return than to anything new. Industrial design, architecture, and UX have all been standing in this spot for a long time. Programming was just unusually camped out at the intimate end of the spectrum.
Closing Thoughts
What kept me up the longest while building Firma this weekend wasn’t the code. It was the descriptions. I wrote somewhere around 30 CLI commands and 50 MCP tools, and despite the volume, the thing that ate up the most time was figuring out how each of these tools should describe itself.
A tool leaves its maker the moment it’s made. You have to design with that departure as a given. Instead of expecting the caller to know all the context, the tool itself has to express more of its own intent. What was interesting about working with MCP wasn’t so much the feeling of learning a new protocol. It was being reminded of how tools have always been used in the first place.
I’m not trying to say the era of function calls is over. But the fact that function calls were one form of interface that worked well only in fairly intimate conditions, rather than the universal default, is now much clearer to me. And designing interfaces outside that intimacy turns out to be more interesting than I expected.
Maybe we’re moving through a quiet shift in the way we design interfaces. From signature to description, from enforcement to persuasion, from assumed intimacy to acknowledged distance. The center of gravity is moving. Making tools may be standing once again at the threshold of returning to its original form.