The Information reports that OpenAI has cut inference costs by more than half on some existing models, while logged-out ChatGPT traffic ran on only a couple hundred Nvidia GPUs.
The obvious guesses include quantization, KV-cache changes, batching, speculative decoding, and routing easy queries cheaper.
If true, it will be a huge core competitive lever, lower cost can raise margins, expand usage limits, or reduce pressure on API pricing.
For some context, OpenAI’s adjusted gross margin fell to 33% in 2025 from 40% in 2024, after inference costs quadrupled.
Some reporting now puts Q1-2026 at 39%, with a 52% target by year-end.
Anthropic looks similar at roughly 44%, so frontier labs remain far below mature software economics.
---
theinformation .com/newsletters/ai-agenda/openai-discovers-new-way-cut-inference-costs-half