Long-tail SEO page

AI cloud pricing glossary

Glossary pages support search visibility, clarify buyer confusion, and create dense internal links across the site.

Input tokens

The tokens sent into the model, including prompts, system instructions, and context.

Output tokens

The tokens generated by the model in response to a request.

Cached tokens

Reusable prompt segments that some providers may price differently from fresh input tokens.

Context window

The maximum amount of input and output context a model can handle in one request.

Rate limit

A provider-imposed cap on requests, tokens, or throughput over a given time period.

Latency

The time it takes for a model response to be generated and returned to the application.

Throughput

The amount of model work a system can process over time, often relevant for high-volume apps.

Fallback model

A secondary model used when the primary model fails, is rate-limited, or is too expensive for a given task.