Stop Pretending Local LLMs Replace Frontier Models
I run local LLMs. I like local LLMs. I want the open-source side of this race to win, badly. So understand the spirit of this post before you reach for the reply button: I am on your side. That is precisely why the current wave of “I run qwen2.5:32b on my Mac and never need Claude again” content makes me want to throw my laptop into the sea.
A joke at best. Sabotage at worst.
What The Demos Actually Show
Go watch any of these videos. Really pay attention to what is on the screen. The “I replaced Claude with a local model” demo is almost always one of the following:
- Autocompleting a function in a 40-line toy file
- Writing a regex
- Generating a one-off bash script
- Refactoring a single component with no surrounding context
- Asking a trivia question
That is the bar. That is what is being held up as evidence that you do not need frontier models anymore. A 32B model running at 8 tokens per second can autocomplete a function in isolation, therefore Anthropic and OpenAI are cooked. Come on.
The reason people pay for Claude is not “complete this function.” It is “here is a real codebase with twenty thousand lines across two hundred files, find the bug, understand the data flow, propose a fix that does not break the four other things that depend on it.” Local models cannot do that today. Not the 7Bs, not the 32Bs, not the 70Bs you can squeeze onto a pair of 3090s. It is not close.
The Things Local Models Are Genuinely Great At
Here is the thing that frustrates me. Local models are SO good at so many real problems right now. Use them for those.
TTS and translation. Genuinely incredible. Whisper-class models run on a potato, sound natural, and never leave your machine. Translation quality on local models is at a point where I do not bother with cloud APIs anymore. There is no reason to send your voice notes to a third party in 2026.
Home assistant stuff. Turning the lights off. Adjusting speaker volume. “Set the kitchen to 22 degrees.” A small local model wired into Home Assistant is faster and more reliable than any cloud assistant because it does not need to round-trip to a data center to flip a switch. And when your internet goes down your house still works.
Triage and routing in n8n. Classify an incoming email into a bucket. Decide if a support ticket is urgent. Tag a piece of content. These are bounded tasks where the model picks from a known set of outputs. A 7B model nails this all day. You do not need a frontier model and you definitely do not want to pay per token for it.
Summarization and classification. Pulling action items out of a meeting transcript. Categorizing receipts. Extracting structured data from unstructured text. Local models eat this for breakfast.
Anything privacy-sensitive. Medical notes, legal documents, internal company data, your therapy journal. There are entire categories of work that should never go anywhere near a cloud API. Local is not “almost as good” here. Local is the only acceptable answer.
This list is not small. It is not a consolation prize. These are real workflows that real people do every day, and local models solve them better than the cloud does on multiple axes - cost, latency, privacy, reliability.
So Why The Lying
This is what I cannot figure out. The honest pitch for local LLMs is already incredible. “Run this on a machine you own, keep your data, never pay a subscription, automate half your house and most of your boring office work.” That is a winning pitch. It sells itself.
Instead the loudest voices in the space have decided the pitch needs to be “and also you can fire your senior engineer.” Why. Why are we doing this.
Every time someone watches one of these videos, fires up Ollama, points it at a real codebase, and watches it produce hot garbage, that is one more person who walks away thinking local models are a toy. They are not a toy. They are a serious piece of infrastructure that is great at a specific and growing set of jobs. The hype is poisoning the well for the actual use cases.
Honesty Is The Move
If you want local LLMs to win, and I do, the path is not pretending they already won. The path is being honest about what they can do today, shipping the workflows where they obliterate the cloud option, and letting the capability gap close on its own. It is closing. Every six months these models get meaningfully better. The trajectory is good.
But you do not get to skip ahead. You do not get to claim the destination because you saw it on the horizon. When a developer tries Qwen on a real engineering task in 2026 because some YouTuber told them it replaces Claude, and it fails, they are not coming back in 2027 when it actually does. You burned that user.
So here is my ask, as someone who genuinely wants the open-source side to win this thing: stop overselling. Local LLMs cannot build software today. That is okay. They do not have to do everything to be worth running. Pick the jobs they are great at, do those jobs, and let the demos speak for themselves.
The truth is more than enough.