Measure, Then Cut
In Which the Author Explains How He Uses Coding Agents
This week's newsletter issues' focus is on how I have adopted coding agents, in particular how I am attempting to leverage local models for powering those agents. Later in the week, I will talk about my setup: hardware, models, agent harnesses, etc.
Today, though, I wanted to talk a bit about the process by which I use these agents.
Soon I Will Be Invincible
My early experiments with coding agents ranged from "cute but minimal" to "agents running amok". Getting an agent to update KDocs was easy, but getting an agent to actually build something was not.
That changed when I ran into Superpowers.
The full Superpowers setup wound up being overly complex, but it really opened my eyes towards how one can configure prompting for agents. Superpowers uses a blend of skills, sub-agent definitions, and harness hooks to define a full software development workflow for agents.
While I "rolled my own" system, I did leverage several facets of the Superpowers system:
- A dedicated "brainstorming" skill for exploring the problem space of some new feature or capability
- A dedicated "plan-writing" skill that breaks down the problem into milestones, each with discrete steps, with test-driven development (write tests, then write code that makes the tests succeed) and a clear verification process — the plan winds up as a Markdown document
- Sub-agents for implementing milestones, so they get a clean context window and possibly a different LLM than used for brainstorming and plan-writing
Next, I will be working on adding in more automated review steps into my workflow — more on this in some future newsletter issue.
It is certainly possible to use Superpowers "out of the box". I did for a bit and had success, despite the fact that Superpowers itself is designed for Web development, not Kotlin Multiplatform. But, I am a "build your own lightsaber" sort of person, which is why I am building up my own skills and process.
My Workflow, Given Powers That Might Be Super
Ed Zitron points out that some people expect that you can fire up Claude Code (or equivalents), tell it to "build me a Salesforce clone", and some time later out pops the clone. Even the Superpowers developer states:
It's not uncommon for Claude to be able to work autonomously for a couple hours at a time without deviating from the plan you put together
That is not pedal assist coding. It certainly is not within the scope of local models, even supplemented by something like Claude Pro. Multiple-hour stretches of autonomous work require at best a maximum-level subscription and more likely API tokens. That gets expensive, and it is not what I am aiming for.
Instead, I do the following:
-
Break a project down into a series of features or other modestly-sized changes. If you are used to Scrum/Agile-style development with story points and Jira tickets, think 1- to 2-point tickets. Your coding agents are in the range of "impressive junior developers" to "talented senior developers with untreated ADHD", so IMHO you are better off with smaller pulses of work.
-
Use my brainstorming and skill-writing tasks to have an agent craft a milestone-based plan. Right now, that work requires a frontier model, but perhaps in time it could be handled by local models. This takes the agent several minutes at most.
-
Have a sub-agent implement the plan, often one milestone at a time, pausing between milestones. If I have local models implement milestones, the milestone-at-a-time process helps me better determine when a sub-agent is struggling to complete the work. If I have a frontier model implementing the milestones, milestone-at-a-time helps me keep the work within the session limits of a subscription model, as it is easy to just stop work between milestones and pick up when the next session window begins. This takes the sub-agent a minute or two per milestone.
-
Make notes of where things went wrong, and if they might be addressed by improving my skills or
AGENTS.mdinstructions, plan to make those changes.
As noted above, I plan on adding more review steps, especially leveraging local models, to accompany my own manual reviews. I should have more to talk about regarding that in a few weeks.
The "build your own lightsaber" approach gives you plenty of options for continuous improvement. If you are using some canned solution, like Superpowers, you might not know where to put additional instructions.
Am I as productive as those who employ agent swarms and try to get models to do bigger tickets with less oversight? Probably not. But for the current state of the art with agents, and my ethical concerns with frontier model use, I am fairly happy with the contours of my current approach.
Where Things Go Wrong
Not surprisingly, implementation goes great if things can go according to the written plan. When unanticipated problems arise when following the plan's instructions, things can get messy.
This is where frontier models shine over local models. A local model is likely to just keep banging its (virtual) head against a (virtual) wall. A frontier model is more likely to be able to get past the problem, though that is far from assured.
From a permission standpoint, I tend not to give my agents too much room to work on their own. For me, reviewing permission requests is not only a security concern, but a "hey, what the heck is this agent up to?" concern. From time to time, I will use a permission request as an opportunity to stop work and provide better direction, do my own investigation, or sometimes just tackle that milestone manually. Much of the time, the problem stems from either my mistake or the plan-writing agent's mistake, and where possible I try to find ways to improve my standard instructions to help avoid this sort of problem in the future.
Later this week, I will get a bit more detailed on what specific models and harnesses I use with this process, including where I run the local models. Until then, go code.
Add a comment: