Skip to main content

Building AI features without losing the plot

Notes on building AI features that earn their place in a product, rather than being defended on the basis of being modern.

Building AI features without losing the plot

Almost every conversation we have with a new client now includes a sentence about AI. Some are specific — they want a recommendation engine, an internal search assistant, a draft generator for a particular kind of long document. Most are not. They want to do something with AI because the board has asked, the competition has shipped, or the founder has spent a weekend with a coding assistant and is excited.

This is not a bad place to start. It is a bad place to finish. We have learned to spend the first half of an AI engagement undoing the assumption that the feature is the model. The model is a component. The feature is the experience around it — the input the user is asked to give, the output they are given back, the confidence with which the output is presented, and the clarity with which they are told when something has gone wrong.

When we work on an AI feature now, we start with three written documents. The first is a use-case statement: what is the feature, who is it for, and what does the user get from it that they do not already have. The second is an evaluation harness: a set of test inputs and expected outputs that we use to compare model versions and prompt changes against each other. The third is a cost-and-latency budget: what we are willing to spend per request, what we are willing to make the user wait for, and what we will do if either of those budgets is breached.

The model is chosen at the end of that process, not the start. Sometimes it is GPT-4o, sometimes it is Claude 3.7, sometimes it is an open-source model running on our own infrastructure. We have shipped features with all three, and the decision is almost always driven by the cost-and-latency budget rather than benchmark scores.

The last thing worth saying is that we have stopped building general-purpose chat interfaces. They are easy to demo and almost impossible to evaluate. A focused feature with structured input and structured output is harder to design and far easier to defend.

— FILED / CONTINUED ON NEXT NOTE Postmortem of a failed e-commerce migration →