AI costs spike as subscriptions hit pricing wall — firms turn towards Chinese LLMs, open-source models to extend budget

BigMacHole@thelemmy.club · 3 hours

If ONLY there was a Cheaper way to get MORE Work done with LESS Mistakes! OH WELL! Don’t forget to BUY our Products with your ~~Unemployment Checks!~~ ~~Savings!~~ WHY aren’t people BUYING things?

UnderpantsWeevil@lemmy.world · 1 hour

I see you’re problem. You’re using the old Fordist mode of economic growth, wherein you employ people at an income level such that they can buy the products they produce. Then you grow your capital stock by reallocating the labor surplus into new undeveloped areas of the country. That model of growth went out of fashion in the 80s.

Post Volcker-Shock, we adopted a strategy of indebting unaligned countries in exchange for modern technological improvements, then harvesting their natural resources at a functional loss to finance the USD denominated debts they struggled to pay back. Then we could effectively export dollars for cheap imports and allow Americans to buy them with money laundered through government contracts, grants, and Fed credit expansion.

But after COVID, we’re on to an even more revolutionary approach to profitability. We don’t need consumers. We don’t need to develop capital in foreign territories. We just do a direct 1:1 exchange of B2B SaaS services and American Dollars with the corporate oligarchs. Now people don’t need to buy anything, because governments buy everything. And if you’re friendly with the government, you get a slice. And if you’re not, you get shoved off a cliff.

melfie@lemmy.zip · 3 hours

Legal won’t allow Chinese models where I work—not just in production, but on employee machines or any company-owned device. I believe the rationale is they don’t want any legal problems if federal or state bans are enacted for “national security”, which certainly isn’t an unfounded fear. I have a feeling a lot of larger companies will be implementing similar policies, and I do also worry that any individual using Chinese models for personal use will be arrested and charged as a terrorist or something. Chinese open weight models like Qwen are fantastic, but it does feel a lot of eggs in one basket.

eleitl@lemmy.zip · 8 hours

If you’re burning 20 kUSD/month on Claude and way more if you’re using agentic AI it better be worth it.

Final Remix@lemmy.world · 3 hours

Narrator: “it’s not.”

douglasg14b@lemmy.world · 16 hours

Lol did they create the image with AI as well?

A chart with a line going downwards to the left hand side as the chart rises to the right is completely wrong.

criss_cross@lemmy.world · 2 hours

Also the random percentages are just jarring.

Honestly I hate the “every article needs an image” rule that seems to exist. 99% of the time this slop detracts from the actual article.

gravitas_deficiency@sh.itjust.works · 12 hours

Maybe the imagegen model was feeling anti-fascististic? I must admit, that does seem to be a rarity.

paraphrand@lemmy.world · 19 hours

China isn’t allowed to use advanced expensive US models. And US nationals can’t afford advanced expensive US models so they want to use Chinese models.

What a weird situation. Huh?

Eager Eagle@lemmy.world · 6 hours

This administration can’t see a week ahead in front of them. Does anyone have doubts this will only consolidate China’s autonomy and eventual dominance in the sector?

Jax@sh.itjust.works · 5 hours

China owns Putin and Putin owns Trump, so this is all part of the plan.

dan@upvote.au · 11 hours

An interesting side effect is that the models coming out of China are very efficient. They don’t have access to all the high-end hardware the US has, so they have to make do with what they’ve got.

StillAlive@piefed.world · 5 hours

Just like Japanese motors and the oil crisis in the 70s.

Joe@discuss.tchncs.de · 13 hours

$14000 in API pricing is not $14000 in costs, though. Costs are hard to calculate because of the huge capital outlays and unknowns about hardware lifecycles, various business deals, and limited public knowledge.

It’s likely that inference costs for good-enough models will go down over time. China’s API pricing tells us the direction already. Energy costs will be a driving factor in the west, I guess.

So… they are almost certainly subsidizing plans right now, but on average, it won’t be by sooo much. Your average ChatGPT user will hardly use Codex, for example. Your average developer is not token-maxxing either.

Why are they subsidizing plans? To build a sticky customer base … which means they want you to stick to their tools - their coding agents/harnesses, their integrations, etc. Models are/will be increasingly interchangeable, so they are building sticky ecosystems instead.

eleitl@lemmy.zip · 8 hours

I’ve seen a datapoint that an 8 hour business day with Claude is about 1 kUSD, so 20 business day month is some 20 kUSD. More with agentic AI.

Joe@discuss.tchncs.de · 2 hours

I have no doubt some people can do that with a large project, a /goal loop, and (probably) poorly defined requirements.

My experience (using Claude models, but not Claude Code) is about $20-40/day worth of API costs in a collaborative mode, picking the right model for the task. Plan, implement, review & test features or bugs.

I get where I’m going faster, but not 10x faster nor 100x the cost. :-P

Arrandee@lemmy.world · 17 hours

So it begins.

tal@lemmy.today · 19 hours

The research firm purchased every subscription from the two AI providers and discovered that the approximate maximum possible spend (assuming API pricing) is far larger than what users pay every month. For example, Claude Max 20x costs $200 a month, but maximizing it would cost $8,000 a month in token spend, while ChatGPT Pro 20x, which is also $200 monthly, has a maximum possible spend of around $14,000.

Ehhh…yeah, but that alone isn’t necessarily an issue. There are plenty of services that exist that rely on consumers, in aggregate, not maximizing resource usage. Residential ISPs normally oversell their service. That works because the typical user only uses a tiny fraction of their sustained maximum rate of bandwidth consumption. In theory, if a lot of users started fully saturating their lines all the time, ISPs could shift everyone to metered service, but it works well enough and enough people value not having to worry about metering more than paying the minimum per-byte cost, so the system functions.

dan@upvote.au · 11 hours

Residential ISPs usually have a contention ratio somewhere around 30:1 to 50:1. That means that 30 to 50 customers that each have a 1Gbps connection all share 1Gbps of upstream bandwidth.

Business connections are closer to 10:1, and a leased line (dedicated circuit) is 1:1.

vrek@programming.dev · 17 hours

I may be wrong but I thought airlines did similar. They sold more tickets than existing seats assuming people would cancel. That’s why sometimes they offer cashback at terminal for a different flight, but it still comes out net positive

dan@upvote.au · 11 hours

I assume this is for basic economy only, where you can’t select a seat? If I choose a seat when booking, I can’t imagine the airline allows someone else to choose the same seat?

foo@feddit.uk · 9 hours

It might depend on the airline. I used to travel with Ryanair frequently, and special tickets (whatever they were called) were only available for 1/3 of the plane’s capacity on a first-come-first-serve basis. Those upgrades got you to choose your seat, skip the queue and guaranteed space for a carry-on bag. All of those things follow a similar pattern: if everyone did it the system would break, which is likely why they picked 1/3 as a cap. It’s actually quite clever, although I still dislike the ongoing enshittification of air travel that the budget airlines have caused, despite benefiting from it for a couple of years.

vrek@programming.dev · 10 hours

I don’t really know the details, sorry.

tal@lemmy.today · 16 hours

Whether they do that or not, I know that they have (or have had in the past) deals where they explicitly provide discounted tickets where you basically have “bottom priority” to get a seat on a flights, and you only get notified whether there’s space for you with a limited number of hours notice. IIRC it’s targeted at retirees, who have a flexible schedule and may favor inexpensive travel.

adarza@piefed.ca · 19 hours

so now they’re gonna be buying more hardware, too, so they can run their own models instead.

great.

Eager Eagle@lemmy.world · 6 hours

at this rate they’ll be buying Chinese hardware in a couple years

Oisteink@lemmy.world · 13 hours

Two different products. One is bundled with other users and you can only get so much within a timeframe without optimising.

The other is what you want when you want it.

I’d expect the costs for these products to differ

Ilovethebomb@sh.itjust.works · 18 hours

They don’t explain what you’d need to do to actually maximise one of these plans, would you be hammering it with prompts 24/7 or something?

grumpy_cat@thelemmy.club · 10 hours

I use cursor for work, and boy I can easily burn 100-300$ a day

setsubyou@lemmy.world · 14 hours

Nowadays agents like Claude Code can run autonomously for hours just given a goal description. It doesn’t take a lot of human effort at all to set up a bunch of sessions, and these companies don’t limit how many instances you run in parallel. Agents can also spawn sub-agents that run in parallel if a task calls for parallelization. Whether all this produces good results is a different story, especially if you don’t put enough effort into the goal description. But burning tokens as such is not difficult.

Even workflows where you’re just chatting with an agent can burn a lot of tokens. When you’re chatting with an LLM, the entire history becomes part of the input each time you send something. This also applies to tool calls, so if the agent decides to read 20 files before it can work on your request that’s 20 times a file gets added to the history and 20 times that entire growing history is then sent back as input to drive the agent’s next step.

Coding is more affected by this than many other applications because even a new conversation tends to start with the agent gathering a bunch of source code files, and then the response to a task is not just a bunch of text once, but a sequence of tool calls to make edits across files, build, run tests, react to test failures, and so on, all for one actual human prompt - but in reality a back-and-forth between the LLM and the harness with a quickly growing history.

tal@lemmy.today · 16 hours

I assume that you’d have some sort of massive workload that you span over multiple plans. You just have software to switch you from one plan to the next once you saturate the plan.

Probably not all that hard to write some kind of software that tries to make massive use of LLMs. Like, oh, I don’t know. Getting all abstract here, any problem in computer science where you have a problem that you don’t know how to solve directly, but you can easily check whether an answer is correct. Then you just keep trying to solve it, and repeatedly check whether the generated answer is correct or not.

Another possibility is that you have a problem where you can quickly check the quality of a given solution (either via human assistance or software, even though you don’t know how to solve the problem yourself), and want to generate a number of solutions and pick the best.

I’ve certainly seen that with image-generating diffusion models, rather than LLMs — stuff like “batch-generate me N images using this prompt, and I’ll pick the best”. It’s an algorithmically-simple, brute-force way of improving quality, by just throwing more compute time at the problem. The human “quality evaluation” is cheap to do compared to the human time required to generate an image. Burns a lot of compute time, but the alternative to improve quality is improving the model, and if we don’t know how to do that yet…shrugs

altkey (he\him)@lemmy.dbzer0.com · 15 hours

Not even that. A business can “implement” AI agent on their website by forwarding client’s inputs to someone else’s API, adding a prompt pointing back at them.