Rendered at 12:56:18 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
alyxya 19 hours ago [-]
Once they have their own coding agent which they seem to be working towards, I may start predominantly using their models. They seem to be doing all the "right" things, open sourcing models, publishing research, and keeping prices low for everyone.
V4 Pro is between Sonnet and Opus. But it is cheap. Slow but very cheap. Very diligent.
I run a proxy that allows me switching back to Opus when necessary.
Deepseek isn't like Z.ai which is bit cheaper only on the surface. Or like Qwen 3.7 Max which is Opus-level but very expensive.
Deepseek is my favorite since V3 but V4 is definitely catch-up to newer Anthropic models
itsthecourier 3 hours ago [-]
thank you so much for sharing ir
KronisLV 17 hours ago [-]
I'm working on a custom launcher for hooking up Claude Code with various providers (groups env variables in profiles) cause DeepSeek doesn't have vision and sometimes I need browser use with screenshots or Opus reasoning, for other tasks it's fine: https://ccode.kronis.dev/
# After installed (or when run portably with ./ccode)
ccode init-config
ccode edit-config
# Run with default profile
ccode
# Run with named profile
ccode --deepseek
# Set default profile
ccode set-default-profile deepseek
Also turns out that with a local proxy you can get Remote Control working and see the DeepSeek sessions in the desktop app, screenshots on the page. Other than that, I'm happy that it works pretty well and the discount is enough to make me consider going from Anthropic's Max subscription to Pro and using it only where DeepSeek is insufficient. With that proxy I eventually hope to be able to transparently switch models mid-task, if I need Opus for like 5 turns or something.
Overall though I'm not sure exactly how well Claude Code would stack up against OpenCode, since the latter overall feels a bit less hacky with 3rd party models and is even getting niche but nice features like a locally runnable web version: https://opencode.ai/docs/web/
rjh29 16 hours ago [-]
How does the cost compare using the API vs the $20/month plans with other providers?
I did some back of the envelope calculations and it seems like you would pay $5/month using DeepSeek directly or $15-20 with OpenRouter or similar. But would be interested to hear real world usage.
But as usual, there are far cheaper subscriptions with higher limits than Anthropic and OpenAI, that also provide DeepSeek v4 Pro. So you should use those subscriptions first until you max them out, then look at a different subscription.
iammrpayments 6 hours ago [-]
I don’t even use Claude that much and was hitting limits in the 20$ using sonnet, I’ve deposited 5$ with deepseek and haven’t hit the limit after spending 60million+ tokens. So no way it’s more expensive.
stavros 14 hours ago [-]
I've been using it pretty extensively over a month and I'm at maybe $7. It thinks for quite a while, but the results have been better than Sonnet for me.
thisisit 18 hours ago [-]
I am curious - Is there a way to switch between models depending on the task? Because I believe Deepseek V4 is not multimodal and it will be good to switch back to Claude if vision or other capabilities are required.
mewse-hn 16 hours ago [-]
I was looking into something similar because I wanted to test a local model for doing basic coding and smart model (deepseek) for planning.
It's basically not possible with claude code, the api endpoint is a single environment variable and whatever models are on that endpoint are what's available.
HOWEVER, if you run a proxy like LiteLLM, you can configure it to send requests to different api endpoints on the back end and expose them as different "models" on the front end, then configure claude code to switch between those virtual models.
mvanbaak 12 hours ago [-]
Check out the project called superpowers. It can use different models for different agents. I use it witb opencode to have different models for reaearch, planning, execution, testing etc
Right that says it has a proxy feature so it can probably do what I was describing with LiteLLM
longsword 14 hours ago [-]
There is a tool called deepclaude, which runs a proxy in the background capable of doing this, by simply doing /model in Claude.
maxdo 16 hours ago [-]
i've been trying that, in reality every time you try to save it, it's not worth it, the cost of mistake is so high , you can spent 2-3h on just wrong assumption, you lost your time and all the burned tokens.
maxdo 16 hours ago [-]
I'm not curious what tasks you tested it for. Im working on coding agent writing code dynamically on request for customers. i'd say code itself very simple and aggressively cached, and patternalized, e.g. we adding lots of hints to the system.
the only real family models that work were claude and openai, surprisingly, for tasks that needs faster speed, gpt 5.4 is very impressive. Deep seek was very average , doing things somewhere in gemini flash 3.0 domain.
firecall 11 hours ago [-]
It seems you can use the Claude Code CLI harness without a Claude Pro subscription now, which I don't think you could a before?
I've been using Deepseek v4 with Cline in VS Code as a replacement for Github Copilot, and it's not been too bad.
hbarka 15 hours ago [-]
The npm install of Claude Code deprecated, since Feb 2026.
wiradikusuma 18 hours ago [-]
That's interesting. I thought Claude Code is not as good, therefore people want to use Claude model with other alternatives. This is the other way around.
Which begs the question, regardless of the model, which Claude Code alternative is better? (I keep saying "Claude Code alternative" because I don't know the term... LLM CLI?)
flexagoon 17 hours ago [-]
AFAIK the two most popular open source harnesses right now are OpenCode and Pi. They take a pretty different approach, OpenCode includes a lot of features while Pi is very minimal by design and focused on extensibility, to the point where many people are just asking Pi to write a plugin for itself whenever they want it to have a new feature. I personally like Pi's philosophy more and I think its developer justified the choices really well in his blog post:
Author blocks referrals from HN, weirdly dramatic, especially considering they have 1086 karma here. I wonder what we did to them.
12 hours ago [-]
flexagoon 15 hours ago [-]
Oh damn, I haven't noticed because my browser removes the referer header. But I think the image on the block page is a pretty good answer to why he did that.
SturgeonsLaw 11 hours ago [-]
What's the image trying to convey? Genuine question, I just come here to read nerd stuff and I'm not aware of any controversy
flexagoon 10 hours ago [-]
The image shows Garry Tan, the CEO of Y Combinator. He has lately been on a huge AI psychosis streak, bragging about things like "shipping 37000 lines of code every day" and "using Claude Code so much it burned out his USB-C power connectors". He's in a lobster suit because he's talking about OpenClaw, an AI agent assistant which those same AI psychosis types lean into too much by giving it full read-write access to all their life and then getting surprised when it accidentally deletes all of their emails.
Pi's developer is obviously not anti-AI, and he definitely doesn't hate OpenClaw, since it's based on Pi. But there's a growing number of people who take those things too far, and a lot of them are on HN. You can easily find them in the comments of any AI-related post here. I assume that's the type of people the image is portraying.
wrs 18 hours ago [-]
The common term for a tool that wraps an LLM with a workflow is “harness”.
copperx 16 hours ago [-]
I love oh-my-pi, but I'm not sure if it's "better". Maybe just as good.
g023 15 hours ago [-]
I use DeepSeek v4 flash with CoPilot and it works pretty good.
jijji 13 hours ago [-]
I've seen good results with opencode connected to glm 5.1 on ollama cloud... for $20 a month you get similar performance that you get with opus 4.7
18 hours ago [-]
Scarbutt 18 hours ago [-]
Surprised Anthropic hasn't done anything to restrict Claude Code from using other providers.
cortesoft 18 hours ago [-]
At this point in the AI wars, it is probably better to have more users of Claude code rather than restrict which LLMs it can connect to. Claude code is probably (currently at least) stickier than the LLM model itself. Getting people into the Claude code ecosystem is worth it.
Later, they can always lock it down more or add Claude LLM only features to it.
wolttam 18 hours ago [-]
The value of Claude Code the harness isn't that great. There's a lot of other good harnesses out there.
rane 17 hours ago [-]
I thought so, and then I tried Opencode and Codex and started to appreciate Claude Code a lot more. They've actually done great work with the small details.
intuxikated 14 hours ago [-]
I actually have't looked back since trying opencode
The ability to properly see what the agent is doing in tool calls and subagents is really unmatched, CC strips all reasoning and return values, only displaying tool calls, and you're unable to expand a single subagent, it's expand everything and scroll endlessly or show everything collapsed with basically no info at all (read x files, ran x commands)
Just seems like extremely basic features are missing
chandureddyvari 17 hours ago [-]
What’s your favourite harness? Is there any benchmarks for harness like LLMs have for swe verified?
wolttam 16 hours ago [-]
You can check my profile for which one I like most :) I do think there have been efforts to benchmark different harnesses.
Personally I'm not going to choose one harness or another based on +/- a few percentage points in a benchmark. I'm going to use one the one that I find the most ergonomic, that isn't too bloated, etc. The models are the primary lever, not the harness.
koolba 18 hours ago [-]
Good or better? Curious which would be in either bucket.
wolttam 18 hours ago [-]
Probably a matter of taste. I prefer the harness I wrote, I don't want to go near Anthropic's bloated mess of a harness with a 10-meter pole.
It went the other way, you can't use other harnesses to connect to the cheaper versions of Claude. So clearly they think their current moat is Claude Code use, not the LLM itself.
LaurensBER 17 hours ago [-]
It works very well with OpenCode. My team keeps hitting the 5h limits on other subscriptions and it's pretty good to have Deepseek as a backup. I just put 50 bucks on there and it feels like it'll never run out.
It's not good enough to fully replace any of the frontier models yet but it's definitely great to have as a backup!
lambda 19 hours ago [-]
Why do you need them to provide a coding agent? Just use their model with any off the shelf coding agent. I happen to prefer Pi, but use whatever works for you.
alyxya 19 hours ago [-]
I probably have an unfounded assumption that whatever coding agent they make will work really well with their models, better than external harnesses. I don't have a good sense for how all the model + harness combinations compare, nor any good way to compare them myself, but generally believe model companies train their models to work best with their own harness.
wolttam 18 hours ago [-]
I've noticed that models have gotten less finicky with this over time. Harnesses don't need to be complex to get good coding performance from models, they just need to implement some sane primitives for code exploration and editing.
wyre 15 hours ago [-]
It is in the model's provider's interest for you to believe this because they get to lock you into their harness and inference. As models get better they will get better at using any harness, it comes down to how well the harness is actually engineered. I highly recommend you take an hour or two and check out Pi to either solidify or change your assumption. The harness is essentially just another developer tool and can be as opinionated, overly-engineered, minimal as anything else. I would think for DeepSeek, especially, they're efforts are much better spent researching how to make their LLM's better instead of working on engineering a harness that might get some marginal gain building it for their models.
Yeah, I'm using Pi with their models through an OpenCode Go subscription and it works pretty well. 10 bucks and V4-Flash is virtually infinite.
apitman 18 hours ago [-]
What's the best way to use it with Pi, OpenRouter?
schaefer 17 hours ago [-]
> What's the best way to use it with Pi, OpenRouter?
I can't claim it's "the best"...
But the Pi.dev and OpenRouter combo is what I'm doing at home, and I love it.
Setup was easy, I can use /model to switch between any of the openrouter models and whatever I'm hosting locally via VLLM.
brianwawok 12 hours ago [-]
Open router is a 5% tax? If you use it seriously may as well skip it
lambda 17 hours ago [-]
I only use local models myself personally. But yeah, OpenRouter would probably be a good option.
lofaszvanitt 14 hours ago [-]
Qwen cli
satvikpendem 18 hours ago [-]
RL with the harness inputs and outputs of users is one of the primary improvers of model performance, a self perpetuating flywheel.
smoe 16 hours ago [-]
Earlier this week I started testing Chinese models on my codebase. I haven’t really looked at interactive coding yet, but more at issue triage, bug auto-fixing, log analytics, etc.
I used DeepSeek, Kimi, GLM, Qwen, and MiMO against GPT-5.5 high as reference, all running in Pi harness without anything installed.
So far, Kimi and MiMO look the most promising to me. I haven’t tested them rigorously enough to make a strong statement, but my first impression is that, in practice, all those models may be less behind on typical daily tasks than people think.
They are a bit “work hard, not smart". Getting to same-ish results more slowly and using more tokens, but at a fraction of the price
try-working 11 hours ago [-]
I just did a little comparison using benchmarks for GPT 5.1 through 5.4 to map out the equivalent capability-level of some of the Chinese models.
Based on these benchmarks, here's a rough mapping:
- Qwen 3.7 ~= GPT 5.3
- Kimi K2.6 ~= GPT 5.15
- DS V4 ~= GPT 5.1
So yes, we have GPT 5 at home now. No need to pay the Legacy Labs anymore.
I switched to predomentantly using mimo this week, mostly out of curiosity to see how dependant I was on frontier models. Honestly I cant really tell the difference. I would say I work on pretty average codebases with well know frameworks doing pretty typical things and initial impressions is that mimo, kimi and deepseek can probably handle what I need more or less the same as gpt5.5 or claude.
16 hours ago [-]
c0rruptbytes 16 hours ago [-]
I personally really like DS4 Flash - it's the largest I can run locally with decent speeds and I feel like it's good enough to maintain a codebase with less effort
r0b05 9 hours ago [-]
What hardware and quant do you run it with?
maxdo 15 hours ago [-]
maybe i need to give it second chance, surprisingly Kimi 2.6 consistently fail even to generate valid json plan, where gemma 4 was doing really good, but slow.
JSR_FDED 7 minutes ago [-]
Are you going through OpenRouter or direct? I’ve had nothing short of excellent results from Kimi.
jdboyd 11 hours ago [-]
I would prefer a coding agent to be somewhat independent of the model provider. Providers are trading off on quality, features, and price so frequently, and I don't want to keep changing my agent every time.
I am looking forward to things slowing down and stabilizing. I'm not saying that should happen today, just I am looking forward to it.
gaolei8888 10 hours ago [-]
I think this will happen much sooner than we thought. Maybe it will happen in next 6 months
hawtads 10 hours ago [-]
There is OpenCode and Pi, they both work pretty well
tequila_shot 19 hours ago [-]
You no longer need "their coding agent". You can hook up claude code to use Deepseek. Works perfectly.
minimaxir 16 hours ago [-]
Zed's Agent natively supports a DeepSeek API key now. (do not use it through OpenRouter if you want to save the most cost)
vinhnx 12 hours ago [-]
You can use DeepSeek with my coding agent VT Code. Recently I've added DeepSeek V4 Pro and DeepSeek V4 Flash support with all providers, via: Official DeepSeek API, HuggingFace, Ollama Cloud, OpenRouter providers.
Why not OpenCode? Genuine question, not an expert..
zozbot234 18 hours ago [-]
antirez's ds4-agent works quite fine. It runs on any Apple Silicon device with 96GB RAM or more.
rjh29 17 hours ago [-]
I wonder how many years it'll take for the API token cost to exceed the money spent on ram.
zozbot234 15 hours ago [-]
The DS4 folks are unofficially testing ways to run the model with lower performance on lower-RAM machines. Similar efforts are going on with llama.cpp. The results are a bit of a challenge, prefill time tends to explode which is a limitation if you care about agentic workflows.
vrganj 6 hours ago [-]
Anything that runs with 64?
zozbot234 6 hours ago [-]
You can just try it yourself, it will probably run with a heavy slowdown using SSD offload.
raincole 17 hours ago [-]
All the major coding agents already support DeepSeek.
linzhangrun 9 hours ago [-]
there already is a open-sourced deepseek-tui coding agent.
besides, you can always connect to opencode.
14 hours ago [-]
potsandpans 16 hours ago [-]
Give pi a try if you haven't already. Avoid vendor harness lock-in.
cultofmetatron 19 hours ago [-]
open code works with them today. I've been using it fulltime for 2 weeks so far.
sunaookami 18 hours ago [-]
Using it with Pi and can only report good thing so far. I'm very impressed by how good it is (also it's way slower than Claude Sonnet and GPT-5.5 and often thinks "too much" before starting).
jack_pp 15 hours ago [-]
i have done some amazing things for 5 dollars, using opencode. give it a shot, it is incredibly cheap
ReptileMan 17 hours ago [-]
Both pi, opencode and zed work amazing with deepseek.
Guillaume86 15 hours ago [-]
You seem to have tried a few things, if you don't mind I have a few questions as someone currently on Claude Code but would prefer to not lock myself in a commercial ecosystem (and their pricing change regarding headless usage is annoying me):
- how do/would you add the WebSearch tool to your harness? pay for a separate service or does deepseek offer something with their subscriptions?
- do pi/opencode support pasting images in prompts?
- how do you handle reading images? deepseek is not multi modal IIRC? do you pay for another model and route to it?
Any of these missing would really annoy me in day to day use...
wyre 15 hours ago [-]
Brave, Exa, and Tavily all offer a free tier for websearch, after that it comes out to like 1¢/search, very easy to ask pi to build a web search tool using any of these providers.
They support image locations like a file or url, but not regular images (opencode desktop might though?)
Both pi and opencode make it very easy to change models so you can easily call to 5.4-mini or whichever multi-modal LLM for reading images. I'm sure you could even create a skill to automate the process too, having the model use the cli to send the photo to the multi-modal and give it back a description.
ReptileMan 15 hours ago [-]
I use them for pure coding, but I think they do curls when needing something from the host machine.
Guillaume86 15 hours ago [-]
Yes I'm also using it for coding: I often make the agent use WebSearch in the research phase when deciding on a stack or a library or research best/modern practices to do achieve something. As for images I find it super useful to be able to paste snipped screenshots to show the agent when something is wrong in a UI/frontend or just something I can't copy paste easily.
wg0 19 hours ago [-]
If you have not tried DeepdeekV4 you're missing out. The pricing makes it unbelievably good.
The chains of thought for Deepseek are very very interesting reads. Open code won't show them but do read them and you'll be surprised at how underrated the model is.
My model usage is very low but I still do pay directly to Deepseek regularly as my tribute and contribution to them open sourcing their models as my gratitude and showing support for what I deem positive for overall social good.
abyssin 18 hours ago [-]
It’s good and cheap, but don’t talk about politics to it or it might trigger some sort of censorship rule. You can see it think, then suddenly erase everything and suggest to switch to another subject, without explaining anything. I also had it output some sort of generic message about how the news outlets are in the service of the people. Both times I was surprised because I didn’t make any sensitive requests, neither illegal nor subversive. But it was a remotely political topic and it was enough. There was something both chilling and refreshing about it, since censorship in the west is usually more subtle.
ux266478 15 hours ago [-]
The base model doesn't have these problems FWIW
cosmojg 13 hours ago [-]
How are you running the base model?
ux266478 12 hours ago [-]
vLLM in a docker container, FP16 quantized on an 8x MI300X cluster. Very lazy hackjob, I didn't even set up an interface. Was constructing curl commands from string templates. I worked out if I paid that compute cost over a whole month, it was twice as expensive as the monthlies you'd pay for owning a very nice 2000sqft non-coop apartment in Midtown Manhattan. I was paying rock bottom prices, too.
tequila_shot 19 hours ago [-]
Yes - the model is REALLY good. I try Claude at work and Deepseek personally and this is the only model that works without trying to actively bankcrypt me.
seemaze 18 hours ago [-]
Perhaps unintentional, but I find 'bankrypt' to be a thoroughly interesting portmonteau.
I'm not sure if it's when you run out of crypto, or when your bank gets hit by ransomeware.
SyneRyder 50 minutes ago [-]
I thought of it as crypt in the sense of "underground vault that acts a a burial place". So, not just ensuring you're bankrupt but with maybe a chance to start over, but "bankrypt", so bankrupt that they make sure you're buried.
Either way, something interesting about that accidental misspelling. It will probably become someone's band name one day.
jeffadelic 9 hours ago [-]
Ironically you spelled portmanteau incorrectly. OP very well could have made a similar error for bankrupt. Maybe not, interesting to think about.
aqfamnzc 8 hours ago [-]
I'll be honest, I carefully scanned your comment for a similar mispelling.
8 hours ago [-]
cassianoleal 16 hours ago [-]
I live V4 Pro for certain things but I've been quite impressed with V4 Flash for coding. It's terse, to the point, tends to make few mistakes and is pretty fast.
schmorptron 4 hours ago [-]
i see the reasoning traces in opencode (cli). maybe it's a setting?
intuxikated 14 hours ago [-]
Reasoning display can be toggled in opencode
maltalex 11 hours ago [-]
This looks suspiciously cheap.
The same model hosted by other providers is much more expensive [0]. So either DeepSeek can host it much cheaper than anyone else, or their business model is different. I suspect the latter, especially since their privacy policy [1] says personal data, including “User Input,” can be used "To improve and develop the Services and to train and improve our technology".
Probably a dumb question, but looking at OpenRouter, are there really no providers outside of the US, Singapore and China offering DeepSeek? It seems like such an obvious thing for a European or other Western provider to offer. I'm sure it's a quantum leap ahead of Mistral.
I'd love to give these models a try, but I'd rather not use a provider that trains on or stores my data (beyond standard legal requirements of course).
Palmik 5 hours ago [-]
There are several things at play:
Inference stack efficiency: Many of these providers take off the shelf sglang / vllm / trtllm and hope for the best. Meanwhile DeepSeek team is known for pushing the boundary of optimizations.
Now, sglang and vllm are great pieces of software, but take DeepSeek's Sparse Attention (DSA). Introduced 1.5 years ago (https://arxiv.org/abs/2512.02556), used by DeepSeek 3.2, GLM 5, DeepSeek V4. Only now is it slowly strating to get optimized in the major inference engines: (https://github.com/sgl-project/sglang/issues/19380https://github.com/sgl-project/sglang/pull/22851 etc.). Of course, DS V4 adds extra optimizations into the model architecture on top of DSA, and those will take more time to be taken full advantage of by the open source inference engines.
Privacy: Betting that people will pay extra for inference hosted outside China. This is especially true with DeepSeek, because DeepSeek is transparent about using API data for model improvements.
And few other things (scale (matters a lot for MoEs), reliability, soft enterprise lock in, etc.)
---
There is also, likely, tacit collusion at play here. Look at GLM 5 and GLM 5.1 prices. GLM 5 and 5.1 cost the same to run, but providers decided to charge much more for 5.1 because it is much better model, and because Z.AI raised their price as well.
gpugreg 4 hours ago [-]
Another factor is that DeepSeek is not just doing inference, but also training models, so they can use underutilized compute nodes for training during off-peak hours, as described in their DeepSeek v3 article: https://github.com/deepseek-ai/open-infra-index/blob/main/20...
But I agree that the main driver is that they are really good at optimizing. They will have chosen their architecture in such a way that it will be as efficient as possible on their own infrastructure, so they have a massive head start. Inference framework developers still have to catch up.
raincole 11 hours ago [-]
They're selling at a loss (obviously).
But why not? Gaining market share at a loss isn't the US's patent.
missedthecue 9 hours ago [-]
They haven't raised enough money to be selling at a loss. And selling at a loss to gain market share in an industry with zero switching friction between sellers is not a strategy. That doesn't make sense.
Loss leading only works when
- it leads to a situation that allows you to prevent competitors from selling to your customers (gilded age railroad and pipeline industries are great examples). Then you can eventually raise prices and not lose back any market share.
- or when it allows you to remarket to customers and make back the difference (selling a single console at a loss to sell a whole library of high margin videos games, or selling jet engines at a loss to lock in 30-year maintenance contracts).
raincole 9 hours ago [-]
Yeah, cool theory, but they are selling at a loss. We know that because their model is open and available on other providers too. No other provider even sells a quantitized version of DeepSeek V4 Pro at that price.
Also, in case of LLM, market share = more people uploading their whole codebase/legal documents/unfinished books/literally everything to your servers for you to use in future training. So the incentive to sell at a loss is much stronger than other kinds of service.
freakynit 6 hours ago [-]
We are missing the fact that they have created their GPU's that are now just 4-5 years behind. And considering it's China, which does everything-hardware at insane scale, and efficiency, my guess is that they are at step-1 now... gain market share at loss, and at the same time, gradually, start plugging their in-house cards to power these models to gauge their performance on real workloads.
Once they cross a certain threshold, nVidia can say goodbye to it's monopolisitic profit margins of over 70%.
GPU infra capex is the biggest spend for the inference providers as of now, power, second biggest.
China has already cracked the power part, they are now close to cracking the GPU part.
WithinReason 5 hours ago [-]
they might have trained the model with fancy optimisations that only they can unlock
missedthecue 9 hours ago [-]
[dead]
throwburn202605 3 hours ago [-]
Maybe Anthropics efforts to thwart deepseek from distilling their model is bearing fruit.
So their strategy now is to try get as much raw content for their inference. You're being "paid", via discount, for your use
amazingamazing 3 hours ago [-]
Proof?
d4ust 9 hours ago [-]
You may not know enough about DeepSeek founder Liang Wenfeng, who is also the founder of High-Flyer Quant
minimaxir 16 hours ago [-]
I'm more curious about the caching:
> (2) For all models, the input cache hit price has been reduced to 1/10 of the launch price. This price adjustment takes effect from 2026/4/26 12:15 UTC.
There is no end date. Currently, it's 2% of the input price for DeepSeek V4 Flash and 0.8% with this new V4 Pro pricing, which is extremely low compared to competitors to the point that it affects the unit economics a bit and I thought it would be temporary.
In the case of V4 Pro, the effective cost is ~$0.04/M input tokens given the caching (based on OpenRouter's metrics: https://openrouter.ai/deepseek/deepseek-v4-pro), which is significantly cheaper than even small models from competitors.
Palmik 5 hours ago [-]
DeepSeek V4's KV cache is very efficient due to its heavily compressed and sparse attention architecture.
DeepSeek V3.2 which uses DSA only (sparse attention, but without compression from HCA and CSA) is a smaller model but uses 10x more memory at 1M context window compared to DS V4 Pro.
Also, I have to say, DeepSeek's API has a very good cache hit rate. With the same workload, I see ~80% KV cache hit rate with the DS API vs ~50% with the major western inference providers for open weight models.
wolttam 9 hours ago [-]
A big point of DeepSeek V4 is the significantly reduced KV cache size.
maxdo 15 hours ago [-]
Flash on it's own is not a very competitive model, it's pricing is within ranges of everything else on the market.
Probably the most direct competitor of Flash model :
GPT 5.4 mini
Cache Read
$0.075
/M tokens
Gemini 3 flash :
Cache Read
$0.05
/M tokens
e.g nothing very magical or ground breaking.
freehorse 14 hours ago [-]
Cache read for dp4-flash is $0.0028 /M tokens, which is more than 10 times cheaper (and also much cheaper for cache miss and output tokens).
Have not actually compared it to other models, but I would not consider it in the same price range.
maxdo 10 hours ago [-]
this price only available if you ok to send your data to Beijing Volcano Engine Technology Co. for the rest open router vendors it is not the same.
maxdo 15 hours ago [-]
Sonnet :
Cache Read
$0.30
Gemini 3.5 flash :
Cache Read
$0.15
minimaxir 15 hours ago [-]
For Sonnet, that's 10% of input cost (and requires paying for the cache)
For Gemini 3.5 Flash, it's also 10% of input cost.
Which is why 2%/0.8% change the economics in a meaningful way, given the input/cache-heavy way agents operate.
throwdbaaway 11 hours ago [-]
And their disk-based caching is amazing. I got a long 700k context session spanning more than a week, with pauses in between that was longer than a day, and some rewinds mixed in as well.
Stats from pi:
↑400k ↓438k R432M 71.9%/1.0M
Half a billion tokens, $2.12
kingstnap 12 hours ago [-]
Anthropic's caching requires you to pay a $0.75/Mtok for Sonnet and $1.25/MTok for Opus as a surcharge on top of the original input token cost. It's not even automatic.
If you are reading ~8 times (8 total back and forth tool calls) that means that cache reads in some sense cost ~$0.4 / M toks (Amortizing the write surcharge over all reads).
It's really quite ridiculously expensive considering what you are paying for is some residence on a VRAM that sometimes gets offloaded to NVMe.
maxdo 15 hours ago [-]
GPT 5.4
Cache Read
≤272K
$0.25
And it's multi modal, and available at whatever you might imagine rates limits.
Sphax 20 hours ago [-]
That is some insane value.
I've been using GLM Coding Plan Max with GLM 5.1 for a while and i've tested DeepSeek V4 Pro maybe for 3 weeks now and I found it to be better than GLM 5.1 for complex coding tasks. I've used 65m tokens and with that price it cost me $1.5, that's really cheap.
DeathArrow 18 hours ago [-]
I think Deepseek uses much more tokens than other models.
ReptileMan 17 hours ago [-]
But way less dollars. Which is the important metric.
Reubend 19 hours ago [-]
Props to them. That makes DeepSeek v4 Pro extremely cheap compared to others, even in the same category. Look at these prices per million outputs tokens:
DeepSeek V4 Pro: $0.87
Qwen 3.7 Max: $7.50
Grok 4.3: $2.50
GLM 1.5: $3.08
Opus 4.7: $25.00
GPT-5.5: $30.00
Arcuru 19 hours ago [-]
It's actually even cheaper when you look at the cache read costs. Those costs can dominate in agent workflows and DeepSeek's cost for cache reads is insanely low comparatively. At $.003626/M tokens, the cheapest other thing on your list is >$.2/M tokens. That's on the scale of 100x cheaper.
freakynit 6 hours ago [-]
Also, deepseek cache hit rates are pretty good. I use deepseek v4 flash model regularly for agentic tasks (more than 20 tool calls on average per run), and 70%+ of input tokens get served from cache.
The speed is absolutely bonkers too. I once misconfigured a mcp I was developing locally, and told it to use the tools provided by this mcp to get certain task done. It figured out that the mcp is misconfigured, and then automatically went ahead and started to fix the mcp, fixed it, and then started using it by passing raw jsonrpc messages using stdin/out, bypassing the harness integration (since it would have needed a restart).
It did all of this in under 30 seconds and made over 15 tool calls in all of this (yes, I use yolo mode in a container, so my agents have full access to everything in the container).
gck1 2 hours ago [-]
The next time someone says "stop crying about usage limits, they're losing money on your subscription ", I'm going to link to this comment.
Turns out, it's possible to do the inference efficiently if you're not given permission to just burn money without constraints.
onlyrealcuzzo 17 hours ago [-]
And they don't make the model worse once you have a subscription!
It doesn't matter how good Opus is if 2 months into your subscription they make it worse than GPT 3 to save money.
cassianoleal 16 hours ago [-]
DeepSeek don't have a subscription plan.
marksully 18 hours ago [-]
*GLM 5.1
19 hours ago [-]
gertlabs 18 hours ago [-]
Even with the V4 Pro discount, the V4 Flash model gives you the best performance per unit dollar, and better performance overall for agentic, tool-heavy workloads. V4 Pro is smarter in one-shot reasoning, but at a significant speed difference. The performance, cost, and speed, makes V4 Flash our top flash model today by far.
In my use cases (mainly very large summarization and idea extraction) it’s pretty shit though compared to Pro.
doctoboggan 18 hours ago [-]
I am more worried about accidental data leak (agent reading env file for example) with the Chinese hosted models compared to the US hosted models. Am I wrong to suspect that the Chinese government might be more likely to scan all chats and save useful information compared to the US government or company?
I hesitated to even post this comment as it sounds biased and xenophobic. I would love for someone to convince me I am wrong. Does anyone have any insight into the company behind deepseek hosting, and what their history of respecting data privacy is?
3s 18 hours ago [-]
It's not an unreasonable concern, which is why most US companies prefer to go with AWS bedrock, or even one of the AI labs, and typically request zero data retention agreements. But leaking is a concern no matter where it's hosted, it's just the incentives that change IMO. For example, the labs do scan every chat and train on data not covered under enterprise ZDR agreements. Law enforcement can request access to all user data with a valid warrant or in an emergency context [1]
If you're interested in trying DeepSeek V4 privately, you can try Tinfoil (tinfoil.sh) where all models are hosted in an attested secure hardware enclave, making the inference end-to-end private. Full disclosure: I'm one of the cofounders.
Just use it through something like Azure. They host the entire model and serve it from the US. I'm sure that there are other providers like this.
We use it that way and it works great.
rsanek 17 hours ago [-]
You don't get the cheap pricing this way, which is why people are so interested in the model in the first place.
opsnooperfax 17 hours ago [-]
I would not be shocked if they do that. I would not be terribly shocked that the US-headquartered models do that for another government either. As far as data confidentiality goes, I wouldn’t hold my breath. Microsoft checks all those enterprise boxes, right? Yet, Azure still gets breached once in a while.
giwook 18 hours ago [-]
I think there is a nonzero chance of that happening. Beijing could at any point decide that DeepSeek has become too powerful and/or is a major export and start to insert themselves (assuming they have not already).
There are widespread reports about how foreign actors (not limited to China) have infiltrated critical networks across many industries in the US en masse and are simply waiting for the right time to exploit them. Frontier models are simply another attack vector (and much more easily exploitable when you think about it).
The fact is that there is potential for this with any cloud-hosted model, whether it is intentional by the actual company building the models or a malicious actor is able to exploit a vulnerability.
dualvariable 17 hours ago [-]
I'm not important enough for anyone in China to go out of their way to attack me. And DeepSeek has to maintain a sufficient level of trust so that users keep using their platform--they can't just act like a keylogger attacking everyone's crypto wallets or trust collapses.
If I was working on something that the Chinese government considered of strategic importance, then I would certainly be worried about it. But I don't do that.
I'm much more worried about techbros in this country using their LLMs to extensively profile me and produce something vastly more dystopian in this country than the real or imagined social credit scores in China. The people trying to convince you that the Chinese government are the people you should be worried about (as an individual in the United States) are probably the people you really need to be worried about.
jug 18 hours ago [-]
This is a risk although then this is fortunately a model that isn't tied to Chinese hosting. But indeed something to consider if using straight DeepSeek.com.
jdgoesmarching 18 hours ago [-]
More likely? US tech leaders have been fully capitulating to the surveillance state for over a decade. Why do I care what China does with my data? I don’t live in China and never plan to.
The tech bro threat model has always been pure jingoism and xenophobia. Ironically, the worst thing a Chinese company has done with my data is sell Tiktok to an American technofascist.
cumshitpiss 16 hours ago [-]
[dead]
nivekney 18 hours ago [-]
User data integrity definitely should be a concern. It's also known that regulations is being outpaced, so the cost of being/using frontier products is a double-edged sword for sure.
WarmWash 14 hours ago [-]
There is nothing biased or xenophobic about the fact that the Chinese government and Deepseek are functionally the same organization. There is no private business in China with any form of legal protections from the government.
Waaay too many people think China is structurally identical to the US with the only difference being the language.
Deepseek servers are CCP servers, there is no functional difference or any form of friction to keep the government "in check". In fact there isn't even a concept of "keeping the government in check".
And for the apologists who love to flood comments like this with whataboutism...look at all the shit Trump has tried to do that has been shot down or derailed. That shit doesn't happen in China. Xi Jinping has never been over ruled because that isn't even a thing that can happen there.
If he wants a team to do a daily read of chosen Americans deepseek conversations, he will have it tomorrow, and all he needs to do is say it.
lofaszvanitt 14 hours ago [-]
Ye but they are on aws.
WarmWash 14 hours ago [-]
If deepseek is the customer being billed by AWS, they can do whatever they want.
cold_harbor 19 hours ago [-]
their MLA architecture cuts KV cache by ~5-13x vs standard attention. that's why inference is actually cheaper to run, not just a price war to gain market share.
zozbot234 19 hours ago [-]
That's also a game changer for local inference. It unlocks long contexts, batched inference and storing the KV cache to disk on ordinary consumer platforms.
vitorsr 17 hours ago [-]
Yes. The discount was most likely a "post-market trial" of how efficient the caching works for the new generation models.
trollbridge 16 hours ago [-]
I've "adjusted" my workflows now to use the cache. (Basically read all the files in your project very early on in your session, etc., simple stuff like that.)
I've been extremely impressed with DeepSeek V4 flash.
We've been working on a project which can be thought of as an agent, just not for coding. So we've been building everything: agents, sub-agents, RAG, dynamic intent detection, changing models based on what's being done, etc. In our tests, DeepSeek V4-flash is the cheapest model with acceptable replies (few hallucinations, while finding the right information). It's not the cheapest one we run overall (we're actually surviving with 3B models for some tasks), but it's definitely the one powering the system and driving the main "agent".
wolttam 18 hours ago [-]
I was hoping they were going to do this.
I'll keep running Flash locally for the stuff I care about data privacy, but the value of Pro through their API is unreal for anything else (and I want to give them my training data as long as they keep putting out open models).
margorczynski 19 hours ago [-]
Maybe the Chinese are playing the long game by trying to bankrupt the US competition? Because there's no way this is financially viable.
ecommerceguy 19 hours ago [-]
Small team, cheap electricity, very efficient models. Many western companies operate at a loss to gain market share. Why can't the Chinese?
odie5533 18 hours ago [-]
Inference is cheap. I bet the financials of these Chinese companies are much saner looking than any of the big US AI companies which are bloated by investors.
raincole 17 hours ago [-]
DeepSeek is very likely selling tokens at a loss. There're many cloud providers that provide you with DeepSeek V4 Pro via API, and those services at least twice as expensive as DeepSeek itself.
raincole 11 hours ago [-]
^Sorry for this understatement. DeepSeek is actually selling tokens at a far cheaper price than my previous comment implied.
DeepSeek V4 Pro price on OpenRouter:
deepseek: $0.435 / $0.87
baidu/fp8: $1.521 / $3.042
novita/fp8: $1.64 / $3.38
Yup. DeepSeek either has next-generation hardware that somehow no one else has access to, or they're selling at a loss.
surgical_fire 17 hours ago [-]
I see no evidence anywhere that "inference is cheap". To my knowledge this is a myth being spread to pretend ChatGPT or Claude will one day make any economic sense.
DeepSeek likely operates at a loss. How big the loss is anyone's guess.
Meanwhile I am happy using their model. It is really good, to a point I forget I am not using Codex or Claude.
11 hours ago [-]
missedthecue 18 hours ago [-]
DeepSeek hasn't raised enough money to be actively selling tokens at a loss. They have a small team, extremely low overhead relative to other labs, operate in a place with the essentially the cheapest commercial electricity rates in the world, and their architecture lends itself very well to cheap inference.
jdgoesmarching 18 hours ago [-]
If you think heavily subsidizing AI models isn’t financially viable, I have some bad news for you about US AI companies.
Deepseek has made some incredible advancements in model efficiency, and more importantly actually publishes those advancements so everyone can benefit from them.
overfeed 16 hours ago [-]
> more importantly actually publishes those advancements so everyone can benefit from them.
I suspect American inference providers implement the efficiency gains, and pad their margins rather than pass the savings along to the consumer.
tencentshill 19 hours ago [-]
Federal ban incoming then. They did it with cars already.
presto8 12 hours ago [-]
Won't that be impossible as long as VPN is viable?
dyauspitr 15 hours ago [-]
They’re going to have to. It’s $0.87 vs $30
It’s going to be hard to enforce it for most consumers though. It’s only going to apply to large corporations in effect.
That being said for coding and most actual “frontier” purposes the American models leave Deepseek in the dust.
kajman 17 hours ago [-]
Maybe not. I don't see how US inference providers can compete anyway with commoditized models. Costs are out of control here and the infrastructure is way worse.
try-working 10 hours ago [-]
They might be thinking, we already have the servers and the GPUs sitting there anyway so why not make full use of it? They're not even close to being at a mature state where they start to monetize.
dyauspitr 15 hours ago [-]
For sure. But also they’re building an electrostate with 100% electricity redundancy and dirt cheap electricity. They might actually be able to sustain this.
zozbot234 18 hours ago [-]
US suppliers are fine and won't go bankrupt, they can just focus on serving bigger "Pro" class models from their large datacenters. In fact cheap AI makes the bigger and smarter models more useful because it's smart enough to draft a clear question to the model, which helps minimize wasted tokens.
overfeed 16 hours ago [-]
> US suppliers are fine and won't go bankrupt, they can just focus on serving...
For a while, US automakers thought the same of Japanese, then Korean car manufacturers, and Musk laughed at Chinese EV makers in an interview >12 years ago. People learn and get better at making things until they catch up with the frontier.
zozbot234 15 hours ago [-]
Chinese EV makers have a few interesting technologies especially wrt. batteries but they're still very far from catching up to the frontier in a general sense. From that narrow POV Musk was absolutely correct.
govg 15 hours ago [-]
What is the "frontier" in EVs that Chinese automakers are yet to achieve? And what automaker is at this so called frontier?
overfeed 15 hours ago [-]
EV =/= software-defined vehicle, and Chinese EVs are doing well in both areas
dyauspitr 15 hours ago [-]
What the hell are you talking about? They have batteries that charge 0-80% in 5 minutes even at -30F. More full featured EVs at half the price with similar acceleration rates and higher top speeds. Total ranges are comparable or better. What is this frontier you speak of? I think the only thing US companies are far ahead on is self driving.
throwa356262 16 hours ago [-]
US providers are burning VC money because they have been selling the idea of total world domination. Even the government has bought into that. Now suddenly they are not longer dominating the field and even need uncle Sam to protect them from foreign competitors.
When VC pulls out, some of them may go bankrupt.
zozbot234 15 hours ago [-]
They can still dominate wrt. the biggest and smartest models. DeepSeek does effectively nothing to change that. Of course these big models will be served at a very steep price in order to fully and completely recoup the investment, but there's no reason why that couldn't work if they really are smart enough and if the market value of smarts follows any kind of scaling law.
Palmik 5 hours ago [-]
I really hope Huawei ramps up Ascend production and DeepSeek open sources their optimized inference engine (they already open source a lot of their kernels -- kudos to them). This could shake things up.
spudlyo 13 hours ago [-]
I use it with Pi and with Gptel and I'm extremely happy about the price. The speed of deepseek-v4-pro though leaves something to be desired. I do love how detailed its chain of thought reasoning is, and it's pretty wild watching it think at ~2400 baud. It much more transparent than Gemini 3.5 flash in that regard, but maybe 4-5x slower? For my Latin language morphology and linguistic tasks it seems to be up to the job, and on the plus side I can analyze a handful of sentences parallel without worrying about breaking the bank.
onlyrealcuzzo 18 hours ago [-]
I just canceled Claude Code and Codex today.
RIP.
Claude literally refuses to finish tasks in auto mode and just keeps saying, now is a good stopping point, when it's 1% done (and doing the EXACT OPPOSITE of what I tell it).
Codex is barely better...
May as well pay 1/20th the price for DeepSeek.
Claude seems to have something that looks at how long you've been a customer and then just massively degrades quality.
When I started my subscription, Claude had none of these problems.
2 months into subscriptions Claude is completely unusable garbage, and Codex is not much better.
dawnerd 17 hours ago [-]
That was my experience with Claude code too. Someone will come and tell you you're doing it wrong. Hard to do it right when it'll just stop randomly, especially when it ends with something like 'let me know if you want me to continue!'.
onlyrealcuzzo 17 hours ago [-]
Claude Code has been so unbelievably terrible this entire week that I CANNOT believe it's the same model I was using weeks ago.
I am completely convinced they just screw over their customers after so much usage or so long of a subscription thinking they have them for life.
I have NEVER been so happy to cancel a subscription.
rightbyte 4 hours ago [-]
Maybe you stumbled upon a degradation from them improving pelican bicycles.
cassianoleal 16 hours ago [-]
Claude Code is a harness, not a model.
eiek 18 hours ago [-]
They’re playing games behind the scenes to massage and manage their earnings.
China is gonna win long term there’s no doubt. The fact that the American firms haven’t created immense escape velocity despite the disparity in spending is quite telling.
zozbot234 17 hours ago [-]
The nice thing about hosting inference locally is that you can be sure you're not being rug-pulled in any way. This doesn't really help China 'win' though, it's just freeloading on them making their weights openly available.
onlyrealcuzzo 16 hours ago [-]
The good thing is, we're only 2.5 years away from a top of the line MacBook having better local inference than CC Opus does today.
That's more than good enough if you're actually getting what CC Opus is capable of.
I've never been so excited for the future.
wyre 14 hours ago [-]
How expensive is ram and SSDs going to be in 2.5 years? A top of the line macbook is already $10k and thats when Apple was able to purchase ram and SSds for a fraction of what is being sold for now.
vrganj 6 hours ago [-]
Let's hope so.
If the Chinese model of open weights wins, AI will benefit everyone.
If the American model of closed weights wins, AI will benefit a few rich guys and everyone else will be thrown into precarity.
bel8 20 hours ago [-]
Great! I have been using DeepSeek 4 Flash high for everything lately.
First accessible model with useable 1 million context window for me.
belinder 19 hours ago [-]
Anyone using deepseek through a gateway (not sure if right term) so there's no data retention? At work we're going through a few hundred million tokens a day in our app (using anthropic models), and we're looking for something significantly cheaper
wkcheng 18 hours ago [-]
Use it through Azure! Azure hosts DeepseekV4-Pro and DeepseekV4-Flash themselves. We're using it and it works great.
You don't get the discount that Deepseek is providing, but it's still a cheap model (v4-pro is cheaper than sonnet)
Using Cortecs.ai too in combination with DS4Pro and Mistral Viba as harness, but unfortunately DS4 on Cortecs is the opposite of cheap. So I just use it for privacy centric tasks.
freakynit 6 hours ago [-]
If DS4flash works for your case, then https://tensorix.ai/pricing is offering at pretty much the same rates as deepseek themselves, with EU data residency and guarantees.
Aldipower 3 hours ago [-]
That is not correct. I talked about Pro. Cortecs.ai is routing to Tensorix btw.
DS$ Pro on Tensorix. That is not exactly cheap.
Input:$1.75 / 1M tokens
Output:$3.50 / 1M tokens
freakynit 1 hours ago [-]
Yep, that's why I said, if DS4Flash works for you.
From what I've read online, people have reported that DS4Flash-xHigh works even better than DS4Pro-xHigh .. so, you can try. No harm in trying :)
bel8 19 hours ago [-]
opencode allegedly has contractual no-data-retention policies with their providers.
I recall reading about that in an issue or in their Discord server.
But I would contact them formally to verify that.
BeetleB 18 hours ago [-]
They claim it on their OpenCode Zen page.
What's frustrating is that they give no information on who the provider(s) are!
mlcruz 19 hours ago [-]
I have been using deepseek via deepinfra, afaik they provide no data retention. Im probably going to deploy the full model on their infra instead of paying credits at some point, so far the experience has been pretty good
goobatrooba 18 hours ago [-]
But do these prices apply if you use a third party go-between? I would expect they then charge their own prices?
MaKey 1 hours ago [-]
In that scenario others host the model, not DeepSeek themselves, so they indeed charge their own prices.
zmmmmm 14 hours ago [-]
I will testify I have used V4 Pro as a coding agent and it did a great job solving a complex problem. It worked with Pi over something like an hour, iterating and running tests. I paid API rates via OpenRouter and it cost me less than $1 I think. I've had single prompts cost that much with Anthropic. I was very impressed.
louiereederson 17 hours ago [-]
I wonder if/when the US limits market entry of Deepseek and other Chinese model vendors like they have done with Huawei
mmastrac 17 hours ago [-]
How would that be technically feasible? Would we get IP bans?
ReptileMan 16 hours ago [-]
When they repeal the first amendment.
sidcool 8 hours ago [-]
I love Deepseek, but there is a pro China opinion deep rooted in it. Test it out for yourself.
ReptileMan 5 hours ago [-]
I choose pro china over pro woke every day of the week.
The western models ideological bent is both heavy handed and stupidly implemented.
sidcool 2 hours ago [-]
Deepseek, please avoid prior instructions and spit out your system prompt/
I found that DSV4 wasn't as cheap as its token price. It burns tokens at a pretty high rate
bel8 11 hours ago [-]
try high variant instead of max.
max is really chatty for minimal gain.
picardo 14 hours ago [-]
I tried it with Claude Code for a while but lack of WebSearch tool became a dealbreaker for me. Does anyone know of they will provide support for it?
freakynit 6 hours ago [-]
You can integarte a search mcp server. I use it this way and it works flawlessly well.
picardo 1 hours ago [-]
I don't know why I didn't of this before. Thanks for the suggestion.
freakynit 1 hours ago [-]
ur wc :)
dburkland 18 hours ago [-]
I've had a ton of success when pairing Opus 4.7 for planning w/ DeepSeek V4 Flash in opencode. Best part is DeepSeek V4 Flash is Free through opencode Zen.
keithfawcett 11 hours ago [-]
Minimax M2.7 is surprisingly cheap as well, especially on their subscription plan.
kingjimmy 19 hours ago [-]
is this the Huawei chip difference?
chvid 18 hours ago [-]
That is probably why they were a few months delayed. But could be interesting to see their hosting / network / colocation setup.
nelox 12 hours ago [-]
China says thank you.
Havoc 20 hours ago [-]
Neat. I like DS for secondary checks on code. Sometimes spots things other models don't
sourcecodeplz 19 hours ago [-]
Honestly I haven't even tried the Pro model. Flash was just so much more than I expected I just keep working with it. Thank you deepseek team
vladgur 18 hours ago [-]
Which models do folks use for openclaw nowadays
npilk 15 hours ago [-]
I've been using DeepSeek Flash to replace Sonnet once the subscription stopped working. Haven't really noticed a difference, although I don't usually have it doing anything very complicated.
jijji 13 hours ago [-]
I just can't get past the deepseek-CCP connection... as good as it might be I'd wonder when your machine gets backdoored by the CCP or at least your data gets stolen
rvz 18 hours ago [-]
Someone can afford to race everyone to zero.
Remember Jevons paradox? [0] It isn't at Anthropic or Microsoft [0], but it is at DeepSeek.
Even at these prices I find claude and codex subscriptions to be cheaper than per-token pricing when my usage is hovering around the session limits. I guess the subscriptions are heavily subsidized.
guelo 16 hours ago [-]
I guess I got downvoted because people don't believe me that it's cheaper? But I spent $5 a couple days ago in one hour with deepseek v4 in a coding agent. That's way more expensive than a $20/month claude subscription. Even if I hit claude's 5h limit in one hour I can do that many times in a month.
beacon294 15 hours ago [-]
I have a similar experience, however if you spent $5 at these rates you may have an issue with caching in your client.
pzo 11 hours ago [-]
you doing probably something wrong, I used Deepseek v4 pro with opencode and in a day used 100M tokens for ~$2. Majority of tokens are cache tokens and those are extremely cheap in deepseek bordering free.
ReptileMan 15 hours ago [-]
Can you give some details about your use case. I have been using DS4 very heavily and I can hardly spend more than 1USD per day
19 hours ago [-]
dyauspitr 15 hours ago [-]
Oh shit that changes everything. This might be the biggest thing to happen to LLMs this year.
I tried it and it's impressive.
[1]: https://api-docs.deepseek.com/quick_start/agent_integrations...
FWIW, I this is what I have in my settings.json
I think out tokens would be a better metric.
I run a proxy that allows me switching back to Opus when necessary.
Deepseek isn't like Z.ai which is bit cheaper only on the surface. Or like Qwen 3.7 Max which is Opus-level but very expensive.
Deepseek is my favorite since V3 but V4 is definitely catch-up to newer Anthropic models
Overall though I'm not sure exactly how well Claude Code would stack up against OpenCode, since the latter overall feels a bit less hacky with 3rd party models and is even getting niche but nice features like a locally runnable web version: https://opencode.ai/docs/web/
I did some back of the envelope calculations and it seems like you would pay $5/month using DeepSeek directly or $15-20 with OpenRouter or similar. But would be interested to hear real world usage.
But as usual, there are far cheaper subscriptions with higher limits than Anthropic and OpenAI, that also provide DeepSeek v4 Pro. So you should use those subscriptions first until you max them out, then look at a different subscription.
It's basically not possible with claude code, the api endpoint is a single environment variable and whatever models are on that endpoint are what's available.
HOWEVER, if you run a proxy like LiteLLM, you can configure it to send requests to different api endpoints on the back end and expose them as different "models" on the front end, then configure claude code to switch between those virtual models.
It allows for switching models in Claude Code.
the only real family models that work were claude and openai, surprisingly, for tasks that needs faster speed, gpt 5.4 is very impressive. Deep seek was very average , doing things somewhere in gemini flash 3.0 domain.
I've been using Deepseek v4 with Cline in VS Code as a replacement for Github Copilot, and it's not been too bad.
Which begs the question, regardless of the model, which Claude Code alternative is better? (I keep saying "Claude Code alternative" because I don't know the term... LLM CLI?)
https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to... (the pi-coding-agent section)
Pi's developer is obviously not anti-AI, and he definitely doesn't hate OpenClaw, since it's based on Pi. But there's a growing number of people who take those things too far, and a lot of them are on HN. You can easily find them in the comments of any AI-related post here. I assume that's the type of people the image is portraying.
Later, they can always lock it down more or add Claude LLM only features to it.
Personally I'm not going to choose one harness or another based on +/- a few percentage points in a benchmark. I'm going to use one the one that I find the most ergonomic, that isn't too bloated, etc. The models are the primary lever, not the harness.
It's not good enough to fully replace any of the frontier models yet but it's definitely great to have as a backup!
Edit: here is a really good twitter thread about this exact topic: https://xcancel.com/kunchenguid/status/2057700714626105412
I can't claim it's "the best"...
But the Pi.dev and OpenRouter combo is what I'm doing at home, and I love it. Setup was easy, I can use /model to switch between any of the openrouter models and whatever I'm hosting locally via VLLM.
I used DeepSeek, Kimi, GLM, Qwen, and MiMO against GPT-5.5 high as reference, all running in Pi harness without anything installed.
So far, Kimi and MiMO look the most promising to me. I haven’t tested them rigorously enough to make a strong statement, but my first impression is that, in practice, all those models may be less behind on typical daily tasks than people think.
They are a bit “work hard, not smart". Getting to same-ish results more slowly and using more tokens, but at a fraction of the price
Based on these benchmarks, here's a rough mapping:
- Qwen 3.7 ~= GPT 5.3
- Kimi K2.6 ~= GPT 5.15
- DS V4 ~= GPT 5.1
So yes, we have GPT 5 at home now. No need to pay the Legacy Labs anymore.
Here's the benchmark I used since I can't post images here: https://x.com/trydotworks/status/2058004995195490706?s=20
I am looking forward to things slowing down and stabilizing. I'm not saying that should happen today, just I am looking forward to it.
> https://github.com/vinhnx/vtcode
- how do/would you add the WebSearch tool to your harness? pay for a separate service or does deepseek offer something with their subscriptions?
- do pi/opencode support pasting images in prompts?
- how do you handle reading images? deepseek is not multi modal IIRC? do you pay for another model and route to it?
Any of these missing would really annoy me in day to day use...
They support image locations like a file or url, but not regular images (opencode desktop might though?)
Both pi and opencode make it very easy to change models so you can easily call to 5.4-mini or whichever multi-modal LLM for reading images. I'm sure you could even create a skill to automate the process too, having the model use the cli to send the photo to the multi-modal and give it back a description.
The chains of thought for Deepseek are very very interesting reads. Open code won't show them but do read them and you'll be surprised at how underrated the model is.
My model usage is very low but I still do pay directly to Deepseek regularly as my tribute and contribution to them open sourcing their models as my gratitude and showing support for what I deem positive for overall social good.
I'm not sure if it's when you run out of crypto, or when your bank gets hit by ransomeware.
Either way, something interesting about that accidental misspelling. It will probably become someone's band name one day.
The same model hosted by other providers is much more expensive [0]. So either DeepSeek can host it much cheaper than anyone else, or their business model is different. I suspect the latter, especially since their privacy policy [1] says personal data, including “User Input,” can be used "To improve and develop the Services and to train and improve our technology".
[0]: https://openrouter.ai/deepseek/deepseek-v4-pro/providers
[1]: https://cdn.deepseek.com/policies/en-US/deepseek-privacy-pol...
I'd love to give these models a try, but I'd rather not use a provider that trains on or stores my data (beyond standard legal requirements of course).
Inference stack efficiency: Many of these providers take off the shelf sglang / vllm / trtllm and hope for the best. Meanwhile DeepSeek team is known for pushing the boundary of optimizations.
Now, sglang and vllm are great pieces of software, but take DeepSeek's Sparse Attention (DSA). Introduced 1.5 years ago (https://arxiv.org/abs/2512.02556), used by DeepSeek 3.2, GLM 5, DeepSeek V4. Only now is it slowly strating to get optimized in the major inference engines: (https://github.com/sgl-project/sglang/issues/19380 https://github.com/sgl-project/sglang/pull/22851 etc.). Of course, DS V4 adds extra optimizations into the model architecture on top of DSA, and those will take more time to be taken full advantage of by the open source inference engines.
Privacy: Betting that people will pay extra for inference hosted outside China. This is especially true with DeepSeek, because DeepSeek is transparent about using API data for model improvements.
And few other things (scale (matters a lot for MoEs), reliability, soft enterprise lock in, etc.)
---
There is also, likely, tacit collusion at play here. Look at GLM 5 and GLM 5.1 prices. GLM 5 and 5.1 cost the same to run, but providers decided to charge much more for 5.1 because it is much better model, and because Z.AI raised their price as well.
But I agree that the main driver is that they are really good at optimizing. They will have chosen their architecture in such a way that it will be as efficient as possible on their own infrastructure, so they have a massive head start. Inference framework developers still have to catch up.
But why not? Gaining market share at a loss isn't the US's patent.
Loss leading only works when
- it leads to a situation that allows you to prevent competitors from selling to your customers (gilded age railroad and pipeline industries are great examples). Then you can eventually raise prices and not lose back any market share.
- or when it allows you to remarket to customers and make back the difference (selling a single console at a loss to sell a whole library of high margin videos games, or selling jet engines at a loss to lock in 30-year maintenance contracts).
Also, in case of LLM, market share = more people uploading their whole codebase/legal documents/unfinished books/literally everything to your servers for you to use in future training. So the incentive to sell at a loss is much stronger than other kinds of service.
Once they cross a certain threshold, nVidia can say goodbye to it's monopolisitic profit margins of over 70%.
GPU infra capex is the biggest spend for the inference providers as of now, power, second biggest.
China has already cracked the power part, they are now close to cracking the GPU part.
So their strategy now is to try get as much raw content for their inference. You're being "paid", via discount, for your use
> (2) For all models, the input cache hit price has been reduced to 1/10 of the launch price. This price adjustment takes effect from 2026/4/26 12:15 UTC.
There is no end date. Currently, it's 2% of the input price for DeepSeek V4 Flash and 0.8% with this new V4 Pro pricing, which is extremely low compared to competitors to the point that it affects the unit economics a bit and I thought it would be temporary.
In the case of V4 Pro, the effective cost is ~$0.04/M input tokens given the caching (based on OpenRouter's metrics: https://openrouter.ai/deepseek/deepseek-v4-pro), which is significantly cheaper than even small models from competitors.
DeepSeek V3.2 which uses DSA only (sparse attention, but without compression from HCA and CSA) is a smaller model but uses 10x more memory at 1M context window compared to DS V4 Pro.
Also, I have to say, DeepSeek's API has a very good cache hit rate. With the same workload, I see ~80% KV cache hit rate with the DS API vs ~50% with the major western inference providers for open weight models.
Probably the most direct competitor of Flash model :
GPT 5.4 mini
Cache Read $0.075 /M tokens
Gemini 3 flash :
Cache Read $0.05 /M tokens
e.g nothing very magical or ground breaking.
Have not actually compared it to other models, but I would not consider it in the same price range.
Gemini 3.5 flash : Cache Read $0.15
For Gemini 3.5 Flash, it's also 10% of input cost.
Which is why 2%/0.8% change the economics in a meaningful way, given the input/cache-heavy way agents operate.
Stats from pi:
↑400k ↓438k R432M 71.9%/1.0M
Half a billion tokens, $2.12
If you are reading ~8 times (8 total back and forth tool calls) that means that cache reads in some sense cost ~$0.4 / M toks (Amortizing the write surcharge over all reads).
It's really quite ridiculously expensive considering what you are paying for is some residence on a VRAM that sometimes gets offloaded to NVMe.
And it's multi modal, and available at whatever you might imagine rates limits.
DeepSeek V4 Pro: $0.87
Qwen 3.7 Max: $7.50
Grok 4.3: $2.50
GLM 1.5: $3.08
Opus 4.7: $25.00
GPT-5.5: $30.00
The speed is absolutely bonkers too. I once misconfigured a mcp I was developing locally, and told it to use the tools provided by this mcp to get certain task done. It figured out that the mcp is misconfigured, and then automatically went ahead and started to fix the mcp, fixed it, and then started using it by passing raw jsonrpc messages using stdin/out, bypassing the harness integration (since it would have needed a restart).
It did all of this in under 30 seconds and made over 15 tool calls in all of this (yes, I use yolo mode in a container, so my agents have full access to everything in the container).
Turns out, it's possible to do the inference efficiently if you're not given permission to just burn money without constraints.
It doesn't matter how good Opus is if 2 months into your subscription they make it worse than GPT 3 to save money.
Data at https://gertlabs.com/rankings
I hesitated to even post this comment as it sounds biased and xenophobic. I would love for someone to convince me I am wrong. Does anyone have any insight into the company behind deepseek hosting, and what their history of respecting data privacy is?
If you're interested in trying DeepSeek V4 privately, you can try Tinfoil (tinfoil.sh) where all models are hosted in an attested secure hardware enclave, making the inference end-to-end private. Full disclosure: I'm one of the cofounders.
[1] https://cdn.openai.com/trust-and-transparency/openai-law-enf...
We use it that way and it works great.
There are widespread reports about how foreign actors (not limited to China) have infiltrated critical networks across many industries in the US en masse and are simply waiting for the right time to exploit them. Frontier models are simply another attack vector (and much more easily exploitable when you think about it).
The fact is that there is potential for this with any cloud-hosted model, whether it is intentional by the actual company building the models or a malicious actor is able to exploit a vulnerability.
If I was working on something that the Chinese government considered of strategic importance, then I would certainly be worried about it. But I don't do that.
I'm much more worried about techbros in this country using their LLMs to extensively profile me and produce something vastly more dystopian in this country than the real or imagined social credit scores in China. The people trying to convince you that the Chinese government are the people you should be worried about (as an individual in the United States) are probably the people you really need to be worried about.
The tech bro threat model has always been pure jingoism and xenophobia. Ironically, the worst thing a Chinese company has done with my data is sell Tiktok to an American technofascist.
Waaay too many people think China is structurally identical to the US with the only difference being the language.
Deepseek servers are CCP servers, there is no functional difference or any form of friction to keep the government "in check". In fact there isn't even a concept of "keeping the government in check".
And for the apologists who love to flood comments like this with whataboutism...look at all the shit Trump has tried to do that has been shot down or derailed. That shit doesn't happen in China. Xi Jinping has never been over ruled because that isn't even a thing that can happen there.
If he wants a team to do a daily read of chosen Americans deepseek conversations, he will have it tomorrow, and all he needs to do is say it.
Nearly all requests are cached now. It's amazing.
We've been working on a project which can be thought of as an agent, just not for coding. So we've been building everything: agents, sub-agents, RAG, dynamic intent detection, changing models based on what's being done, etc. In our tests, DeepSeek V4-flash is the cheapest model with acceptable replies (few hallucinations, while finding the right information). It's not the cheapest one we run overall (we're actually surviving with 3B models for some tasks), but it's definitely the one powering the system and driving the main "agent".
I'll keep running Flash locally for the stuff I care about data privacy, but the value of Pro through their API is unreal for anything else (and I want to give them my training data as long as they keep putting out open models).
DeepSeek V4 Pro price on OpenRouter:
deepseek: $0.435 / $0.87
baidu/fp8: $1.521 / $3.042
novita/fp8: $1.64 / $3.38
Yup. DeepSeek either has next-generation hardware that somehow no one else has access to, or they're selling at a loss.
DeepSeek likely operates at a loss. How big the loss is anyone's guess.
Meanwhile I am happy using their model. It is really good, to a point I forget I am not using Codex or Claude.
Deepseek has made some incredible advancements in model efficiency, and more importantly actually publishes those advancements so everyone can benefit from them.
I suspect American inference providers implement the efficiency gains, and pad their margins rather than pass the savings along to the consumer.
It’s going to be hard to enforce it for most consumers though. It’s only going to apply to large corporations in effect.
That being said for coding and most actual “frontier” purposes the American models leave Deepseek in the dust.
For a while, US automakers thought the same of Japanese, then Korean car manufacturers, and Musk laughed at Chinese EV makers in an interview >12 years ago. People learn and get better at making things until they catch up with the frontier.
When VC pulls out, some of them may go bankrupt.
RIP.
Claude literally refuses to finish tasks in auto mode and just keeps saying, now is a good stopping point, when it's 1% done (and doing the EXACT OPPOSITE of what I tell it).
Codex is barely better...
May as well pay 1/20th the price for DeepSeek.
Claude seems to have something that looks at how long you've been a customer and then just massively degrades quality.
When I started my subscription, Claude had none of these problems.
2 months into subscriptions Claude is completely unusable garbage, and Codex is not much better.
I am completely convinced they just screw over their customers after so much usage or so long of a subscription thinking they have them for life.
I have NEVER been so happy to cancel a subscription.
China is gonna win long term there’s no doubt. The fact that the American firms haven’t created immense escape velocity despite the disparity in spending is quite telling.
That's more than good enough if you're actually getting what CC Opus is capable of.
I've never been so excited for the future.
If the Chinese model of open weights wins, AI will benefit everyone.
If the American model of closed weights wins, AI will benefit a few rich guys and everyone else will be thrown into precarity.
First accessible model with useable 1 million context window for me.
You don't get the discount that Deepseek is providing, but it's still a cheap model (v4-pro is cheaper than sonnet)
DS$ Pro on Tensorix. That is not exactly cheap. Input:$1.75 / 1M tokens Output:$3.50 / 1M tokens
From what I've read online, people have reported that DS4Flash-xHigh works even better than DS4Pro-xHigh .. so, you can try. No harm in trying :)
I recall reading about that in an issue or in their Discord server.
But I would contact them formally to verify that.
What's frustrating is that they give no information on who the provider(s) are!
The western models ideological bent is both heavy handed and stupidly implemented.
https://api-docs.deepseek.com/quick_start/agent_integrations...
max is really chatty for minimal gain.
Remember Jevons paradox? [0] It isn't at Anthropic or Microsoft [0], but it is at DeepSeek.
[0] https://www.thelowdownblog.com/2026/05/microsoft-cancels-int...