The Screws Tighten
In Which the Author Examines Recent Frontier Model Subscription Pricing News
Last week, Anthropic tried removing Claude Code access from the 20 USD/month Claude Pro subscription plans. This did not go over very well, and later the plan was put on hold.
Around the same time, Microsoft-owned GitHub paused signups for individual Copilot plans and made other tactical changes to reduce their costs. Longer term, it appears as though they will go with straight token-based billing.
Overall, hosted AI compute may be in trouble:
This means that the startups that are using AI agents to scale their operations are doing so at a time when AI costs are unsustainably low and may wake up one day to find that their compute costs suddenly double, 10x, or that they simply aren’t able to access compute anymore.
About six weeks ago, I blogged about this, and so I am not terribly surprised that these sorts of changes are happening. My prediction is that it will get worse before it gets better.
Even pricing "silver linings" have their accompanying clouds. DeepSeek released two new hosted models, though DeepSeek-V4-Flash might be able to run on commodity hardware after quantizing. Those models, when hosted, are cheap compared to major frontier models from others. However, just as we do not really know how much investor dollars are subsidizing the prices for OpenAI and Anthropic models, we do not know how much DeepSeek V4 models are subsidized by various parties (e.g., the Chinese government). Maybe those prices are sustainable, maybe they are not.
If the subscription pricing model collapses, and OpenAI and Anthropic have everyone use their standard token pricing, the cost of running their models jumps. If they have to start raising those prices further (e.g., they can no longer afford the subsidy), the cost of running their models jumps a lot.
My guess is that there will be increased interest in blended approaches, leveraging frontier models for high-thinking concerns and delegating "nuts and bolts" work to lower-cost or local models. For example, rather than agent harnesses giving tools to the frontier models, they can give tools to the local models, and the frontier models can delegate questions and work to the local models. That sort of split holds more promise in the short term than does using local models exclusively, as their reasoning capability is modest and slow.
Another approach is to reduce the model's work to pure generation, rather than doing a lot of reasoning. I hope to experiment with model-driven mutation testing, for example, with a scaffold handling most of the work of running tests and having the model simply generate the mutated source file. This is work that could run overnight, preparing a report for the following morning with the testing gaps the mutation testing uncovered. I hope to write more about this in a month or so.
Regardless, more work needs to be done in these sorts of areas, as we have to assume that the AI compute crunch will continue.
By the way, if you happened to install Claude Desktop, you may want to poke around and see what else got installed.
Add a comment: