What I'm Using: April 2026 Edition

or the GPU


            
        April 23, 2026
    
    
What I'm Using: April 2026 Edition
In Which the Author Talks About Hardware, Models, and Agent Harnesses


        In my last issue, I explained a bit about my overall process for using coding agents. Today I will write up what specifically I am using to implement that process today... with a caveat related to some news from this week.
My Hardware
For a couple of decades, I was a happy Ubuntu user. That changed when I started having to use macOS for work — toggling my mind back and forth between the two operating systems was annoying. So, I migrated my personal development setup to macOS as well.
For this AI work, that turned out to be a fortuitous decision.
Apple's "unified memory architecture" that they added with Apple Silicon means that the entire system RAM can be used either by the CPU or the GPU. That turns out to be really useful for running local LLMs, contrasted with needing dedicated graphics cards with serious RAM.
My local model "brain" is a Mac Studio, with an M2 Ultra processor and 64GB of unified memory. If you are not familiar with the Studio, it is roughly two Mac Minis stacked atop each other. And, if you are not familiar with the Ultra CPU category, it is roughly two Mac chips stacked atop each other. So, an M2 Ultra is two M2 Max processors. This is a reasonably powerful box, more than sufficient to run some of the better open weights models. Fortunately, I managed to snag one used on eBay before the market for such devices really blew up.
My development machine is a separate MacBook Pro, with an M4 Max CPU, also with 64GB of RAM. I only sporadically run local models on this machine right now, mostly for functional tests on an agent harness that I am building.
For a lot of people, the hardware that I am using is beyond their budgets. My hope is that as open weights models continue to improve, that not only can developers with hardware like mine leverage those models for more work, but developers with lesser hardware can start using those models for specific coding chores.
Local Model = Qwen 3.6
I play with many of the open weights models as they show up, if they show signs of being tuned for agentic coding. My current "go-to" open weight model is Qwen 3.6, specifically qwen3.6:35b-a3b-nvfp4 via Ollama. While writing this paragraph, I noticed that a couple of days later, they released qwen3.6:35b-a3b-coding-nvfp4, which should be the same model with additional agentic coding fine tuning — I will be giving that a try in the coming days.
If you are unfamiliar with Ollama, it is the leading host of open weights models. Its core is a Web service that offers an API that is compatible with those used by OpenAI and Anthropic for their frontier models. That Web service by default is only visible to localhost, but I have my Mac Studio set up to make it visible on my home office network, and I install models like Qwen 3.6 there.
These Qwen 3.6 models, specifically with the nvfp4 encoding, require 22GB of storage. Much of that will get loaded into RAM as well as the model gets used. A dedicated 64GB machine has plenty of room for that, but bear in mind that other things will use memory, notably the "context window" representing the working set of content that the LLM "knows".
Frontier Model = Claude Sonnet and Haiku... For Now
While I continue to build out stuff to help me use local models for as much as possible, my primary coding models at the moment are Claude Sonnet and Haiku. I use Sonnet for "brainstorming" and writing the feature development plan, and I use Haiku most of the time for implementing the plan. Sometimes I try a local model for implementation, especially if the work should be pretty simple. Sometimes I use Sonnet for implementation, if some of the work seems like it is likely to "go sideways" and require more effort to get things right.
I do not use Claude Opus much. I can definitely see where if you are using agent swarms and trying to have Claude build larger features autonomously, you might need Opus to develop the plans. For "pedal assist coding", where the scope of any feature is smaller, Sonnet has generally proved to be sufficient. Sometimes I wonder if I would have had better luck with Opus on certain features, where the plan developed by Sonnet had issues that bogged down the implementation phase. But Opus is more expensive, both in terms of usage costs and in terms of ethical impact, so I try to minimize my use of it.
One of the "gates" that I have to control how much I use frontier models is the subscription level. Right now, I am signed up for Claude Pro, and for the amount that I use these models, that is a great fit. As we will see toward the end of this issue, though, I might need to change my frontier model soon.
Agent Harness = Claude Code, Primarily
An agent harness is the software that lets a model (local or hosted) interact with the software that you are building and therefore serve as a coding agent.
To use Sonnet and Haiku with Claude Pro for coding, the best option is to use Claude Code as the agent harness. I am unaware of any other agent harness that is authorized by Anthropic for use with the Claude Pro subscription (though there are hacks...).
I even use Claude Code with Qwen 3.6 and other local models that I test. If you set environment variables before running the claude command, you can "point" Claude Code to an Ollama instance and have it use a model served by that instance. Overall, I have had problems with local models and harnesses, where the models screw up using tools too often. That said, I have had better luck using Claude Code than open source agent harnesses like OpenCode or Goose.
However, the limitations of local models is a chunk of the reason why I am building up my own agent harness for specific tasks. I will be writing a lot about that in the coming months, once I have the first version of it published as open source.
A Preview of Coming Retractions
It is very possible that in the weeks that follow that I will wind up switching off Claude's models. The reason: Anthropic may remove Claude Code access from the Claude Pro subscription. While it appears that Anthropic has backed down from this plan for now, I remain wary.
Should Anthropic make a change that removes Claude Code support from Claude Pro subscriptions, I will then give Mistral and Mistral Vibe a try, before perhaps going to OpenAI's GPT and Codex.
I will write a bit more about Anthropic's possible move here, and related maneuvers by others, next week.
    

                                Don't miss what's next. Subscribe to Pedal Assist Coding:
                            
                        
            Email address (required)
            
            
          Add a comment: