Input tokens
The tokens sent into the model, including prompts, system instructions, and context.
Glossary pages support search visibility, clarify buyer confusion, and create dense internal links across the site.
The tokens sent into the model, including prompts, system instructions, and context.
The tokens generated by the model in response to a request.
Reusable prompt segments that some providers may price differently from fresh input tokens.
The maximum amount of input and output context a model can handle in one request.
A provider-imposed cap on requests, tokens, or throughput over a given time period.
The time it takes for a model response to be generated and returned to the application.
The amount of model work a system can process over time, often relevant for high-volume apps.
A secondary model used when the primary model fails, is rate-limited, or is too expensive for a given task.