💾
Claude's caching is dramatically better
Claude offers up to 90% off cached input tokens vs OpenAI's 50%. For RAG apps with long system prompts, this can flip which provider is actually cheaper.
⚡
Stack batch + caching for maximum savings
Both discounts apply simultaneously. Claude Sonnet 4.6 cached + batch effective input rate drops to ~$0.15/M — nearly as cheap as GPT-4o mini.
📈
Output tokens dominate cost at scale
Output is 5–10× more expensive than input. Keep responses concise with clear instructions. Adding "be brief" to your system prompt can cut bills significantly.
⚠️
Long conversations compound fast
Each message in a thread re-sends the full conversation history as input. A 100-turn conversation can cost 50× more than 100 independent calls.