Evergreen guide

AI API pricing guide

AI API pricing looks simple from the outside, but the total bill depends on several interacting variables. This guide explains the pieces that matter most so teams can compare providers without relying on marketing pages alone.

Input tokens

Input tokens cover the text you send into the model. That includes user prompts, system instructions, retrieval context, tool traces, and any additional formatting overhead created by your application.

Output tokens

Output tokens cover the model response. In many real products, output cost becomes the main driver because teams allow answers to grow too long over time.

Cached prompts

Some platforms offer lower pricing for repeated prompt components such as large system prompts or reused context. Caching can materially reduce spend when your application structure is stable.

Context windows and hidden prompt growth

As products mature, prompts tend to grow. Teams add retrieval context, safety rules, routing hints, memory, and tool instructions. Even if rate-card pricing stays flat, larger prompts can push budgets up quickly.

Rate limits

Rate limits affect cost planning indirectly. Tight limits can force batching, queuing, or fallback models, which changes both user experience and the final economics of the system.

Practical budgeting checklist

  • Model input and output separately.
  • Create best-case, expected-case, and high-growth estimates.
  • Measure the cost of retries, moderation, and observability.
  • Compare cost per successful task, not only cost per request.
  • Revisit assumptions after every major product change.

Use this guide with the rest of the site

This page works best as a reference hub. Readers can move from here into the calculator, the comparison hub, or the glossary depending on how much detail they need.