{"slug": "higher-rate-limits-on-the-claude-api", "title": "Higher rate limits on the Claude API", "summary": "Anthropic announced higher rate limits on the Claude API, including increased monthly spend caps across usage tiers—Start ($500), Build ($1,000), and Scale ($200,000)—and clarified that only uncached input tokens count toward input tokens per minute (ITPM) limits, effectively boosting throughput for cached prompts.", "body_md": "We use cookies to deliver and improve our services, analyze site usage, and if you agree, to customize or personalize your experience and market our services to you. You can read our Cookie Policy [here](https://www.anthropic.com/legal/cookies).\n\n** Claude Platform on AWS:** The rate limits on this page apply. Billing and spend limits differ: spend limits are not available, and billing is through AWS Marketplace (not Anthropic credit purchases). Organizations on Claude Platform on AWS are placed on the Start tier and do not move between usage tiers automatically. To request higher limits, contact your Anthropic account representative. Per-workspace rate limit configuration and\n\nThere are two types of limits:\n\nThe API enforces service-configured limits at the organization level, but you may also set user-configurable limits for your organization's workspaces.\n\nThese limits apply to both Standard and Priority Tier usage. For more information about Priority Tier, see [Service Tiers](/docs/en/api/service-tiers).\n\nEach of the Start, Build, and Scale tiers carries a monthly spend cap, which is the maximum your organization can spend on the API each calendar month. Once you reach your tier's spend cap, API usage pauses until the next month unless you request a higher limit. You can view your organization's monthly spend cap on the [Limits](/settings/limits) page.\n\n| Usage tier | Monthly spend cap |\n|---|---|\n| Start | $500 |\n| Build | $1,000 |\n| Scale | $200,000 |\n\nOrganizations on the Custom tier have no monthly spend cap; limits are arranged with their account team.\n\nYou can also set your own spend limit below your tier's cap to control costs:\n\nNavigate to the Limits page\n\nGo to [Settings > Limits](/settings/limits) in the Claude Console.\n\nOpen the spend limit editor\n\nIn the **Spend limits** section, click **Change Limit** (or **Set spend limit** if no limit is currently set).\n\nAdjust your spend limit\n\nEnter a new value. Your spend limit cannot exceed your current tier's cap.\n\nThe rate limits for the Messages API are measured in requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM) for each model class.\nIf you exceed any of the rate limits you will get a [429 error](/docs/en/api/errors) describing which rate limit was exceeded, along with a `retry-after`\n\nheader indicating how long to wait.\n\nYou might also encounter 429 errors because of acceleration limits on the API if your organization has a sharp increase in usage. To avoid hitting acceleration limits, ramp up your traffic gradually and maintain consistent usage patterns.\n\nMany API providers use a combined \"tokens per minute\" (TPM) limit that may include all tokens, both cached and uncached, input and output. **For most Claude models, only uncached input tokens count towards your ITPM rate limits.** This is a key advantage that makes the rate limits effectively higher than they might initially appear.\n\nITPM rate limits are estimated at the beginning of each request, and the estimate is adjusted during the request to reflect the actual number of input tokens used.\n\nHere's what counts towards ITPM:\n\n`input_tokens`\n\n(tokens after the last cache breakpoint) ✓ `cache_creation_input_tokens`\n\n(tokens being written to cache) ✓ `cache_read_input_tokens`\n\n(tokens read from cache) ✗ The `input_tokens`\n\nfield only represents tokens that appear **after your last cache breakpoint**, not all input tokens in your request. To calculate total input tokens:\n\n```\ntotal_input_tokens = cache_read_input_tokens + cache_creation_input_tokens + input_tokens\n```\n\nThis means when you have cached content, `input_tokens`\n\nwill typically be much smaller than your total input. For example, with a 200k token cached document and a 50 token user question, you'd see `input_tokens: 50`\n\neven though the total input is 200,050 tokens.\n\nFor rate limit purposes on most models, only `input_tokens`\n\n+ `cache_creation_input_tokens`\n\ncount toward your ITPM limit, making [prompt caching](/docs/en/build-with-claude/prompt-caching) an effective way to increase your effective throughput.\n\n**Example**: With a 2,000,000 ITPM limit and an 80% cache hit rate, you could effectively process 10,000,000 total input tokens per minute (2M uncached + 8M cached), because cached tokens don't count towards your rate limit.\n\nClaude Haiku 3.5 (marked with † in the following rate limit tables) also counts `cache_read_input_tokens`\n\ntoward ITPM rate limits.\n\nFor all models without the † marker, cached input tokens do not count towards rate limits and are billed at a reduced rate (10% of base input token price). This means you can achieve significantly higher effective throughput by using [prompt caching](/docs/en/build-with-claude/prompt-caching).\n\n**Maximize your rate limits with prompt caching**\n\nTo get the most out of your rate limits, use [prompt caching](/docs/en/build-with-claude/prompt-caching) for repeated content like:\n\nWith effective caching, you can dramatically increase your actual throughput without increasing your rate limits. Monitor your cache hit rate on the [Usage page](/usage) to optimize your caching strategy.\n\nOTPM rate limits are evaluated in real time as output tokens are produced, counting only the actual tokens generated. The `max_tokens`\n\nparameter does not factor into OTPM rate limit calculations, so there is no rate limit downside to setting a higher `max_tokens`\n\nvalue.\n\nRate limits are applied separately for each model; therefore you can use different models up to their respective limits simultaneously.\nYou can check your current rate limits and behavior in the [Claude Console](/settings/limits), or read the configured limits programmatically with the [Rate Limits API](/docs/en/manage-claude/rate-limits-api).\n\nRate limits are currently shared across all `inference_geo`\n\nvalues. Requests with `inference_geo: \"us\"`\n\nand `inference_geo: \"global\"`\n\ndraw from the same rate limit pool.\n\n* - Opus rate limit is a total limit that applies to combined traffic across Claude Opus 4.8, Opus 4.7, Opus 4.6, and Opus 4.5.\n\n** - Sonnet 4.x rate limit is a total limit that applies to combined traffic across Sonnet 4.6 and Sonnet 4.5.\n\n† - Limit counts cache_read_input_tokens towards ITPM usage.\n\nThe Message Batches API has its own set of rate limits which are shared across all models. These include a requests per minute (RPM) limit to all API endpoints and a limit on the number of batch requests that can be in the processing queue at the same time. A \"batch request\" here refers to part of a Message Batch. You may create a Message Batch containing thousands of batch requests, each of which count towards this limit. A batch request is considered part of the processing queue when it has yet to be successfully processed by the model.\n\n[Claude Managed Agents](/docs/en/managed-agents/overview) endpoints are rate-limited per organization. These limits are separate from the Messages API rate limits above.\n\n| Operation | Limit |\n|---|---|\n| Create endpoints (for example, agents, sessions, and environments) | 300 requests per minute |\n| Read endpoints (for example, retrieve, list, and stream) | 1,200 requests per minute |\n\nWhen using [fast mode](/docs/en/build-with-claude/fast-mode) (research preview) with `speed: \"fast\"`\n\non Claude Opus 4.8, Opus 4.7, or Opus 4.6, dedicated rate limits apply that are separate from standard Opus rate limits. When fast mode rate limits are exceeded, the API returns a `429`\n\nerror with a `retry-after`\n\nheader.\n\nThe response includes `anthropic-fast-*`\n\nheaders that indicate your fast mode rate limit status. See [Fast mode](/docs/en/build-with-claude/fast-mode#rate-limits) for details on these headers.\n\nYou can monitor your rate limit usage on the [Usage](/usage) page of the [Claude Console](/).\n\nIn addition to providing token and request charts, the Usage page provides two separate rate limit charts. Use these charts to see what headroom you have to grow, when you may be hitting peak use, better understand what rate limits to request, or how you can improve your caching rates. The charts visualize a number of metrics for a given rate limit (for example, per model):\n\nTo request higher rate limits or a higher monthly spend cap, use **Request rate limit increase** on the [Limits](/settings/limits) page.\n\nSupport can also raise limits. For urgent needs, contact [support](https://support.anthropic.com).\n\nFor more about workspaces, see [Workspaces](/docs/en/manage-claude/workspaces).\n\nTo protect Workspaces in your Organization from potential overuse, you can set custom spend and rate limits per Workspace.\n\nExample: If your Organization's limit is 40,000 input tokens per minute and 8,000 output tokens per minute, you might limit one Workspace to 30,000 input tokens per minute. This protects other Workspaces from potential overuse and ensures a more equitable distribution of resources across your Organization. The remaining unused tokens per minute (or more, if that Workspace doesn't use the limit) are then available for other Workspaces to use.\n\nNote:\n\nTo read your current organization and workspace rate limits programmatically, use the [Rate Limits API](/docs/en/manage-claude/rate-limits-api).\n\nThe API response includes headers that show you the rate limit enforced, current usage, and when the limit will be reset.\n\nThe following headers are returned:\n\n| Header | Description |\n|---|---|\n`retry-after` | The number of seconds to wait until you can retry the request. Earlier retries will fail. |\n`anthropic-ratelimit-requests-limit` | The maximum number of requests allowed within any rate limit period. |\n`anthropic-ratelimit-requests-remaining` | The number of requests remaining before being rate limited. |\n`anthropic-ratelimit-requests-reset` | The time when the request rate limit will be fully replenished, provided in RFC 3339 format. |\n`anthropic-ratelimit-tokens-limit` | The maximum number of tokens allowed within any rate limit period. |\n`anthropic-ratelimit-tokens-remaining` | The number of tokens remaining (rounded to the nearest thousand) before being rate limited. |\n`anthropic-ratelimit-tokens-reset` | The time when the token rate limit will be fully replenished, provided in RFC 3339 format. |\n`anthropic-ratelimit-input-tokens-limit` | The maximum number of input tokens allowed within any rate limit period. |\n`anthropic-ratelimit-input-tokens-remaining` | The number of input tokens remaining (rounded to the nearest thousand) before being rate limited. |\n`anthropic-ratelimit-input-tokens-reset` | The time when the input token rate limit will be fully replenished, provided in RFC 3339 format. |\n`anthropic-ratelimit-output-tokens-limit` | The maximum number of output tokens allowed within any rate limit period. |\n`anthropic-ratelimit-output-tokens-remaining` | The number of output tokens remaining (rounded to the nearest thousand) before being rate limited. |\n`anthropic-ratelimit-output-tokens-reset` | The time when the output token rate limit will be fully replenished, provided in RFC 3339 format. |\n`anthropic-priority-input-tokens-limit` | The maximum number of Priority Tier input tokens allowed within any rate limit period. (Priority Tier only) |\n`anthropic-priority-input-tokens-remaining` | The number of Priority Tier input tokens remaining (rounded to the nearest thousand) before being rate limited. (Priority Tier only) |\n`anthropic-priority-input-tokens-reset` | The time when the Priority Tier input token rate limit will be fully replenished, provided in RFC 3339 format. (Priority Tier only) |\n`anthropic-priority-output-tokens-limit` | The maximum number of Priority Tier output tokens allowed within any rate limit period. (Priority Tier only) |\n`anthropic-priority-output-tokens-remaining` | The number of Priority Tier output tokens remaining (rounded to the nearest thousand) before being rate limited. (Priority Tier only) |\n`anthropic-priority-output-tokens-reset` | The time when the Priority Tier output token rate limit will be fully replenished, provided in RFC 3339 format. (Priority Tier only) |\n\nThe `anthropic-ratelimit-tokens-*`\n\nheaders display the values for the most restrictive limit currently in effect. For instance, if you have exceeded the Workspace per-minute token limit, the headers will contain the Workspace per-minute token rate limit values. If Workspace limits do not apply, the headers will return the total tokens remaining, where total is the sum of input and output tokens. This approach ensures that you have visibility into the most relevant constraint on your current API usage.\n\nWas this page helpful?", "url": "https://wpnews.pro/news/higher-rate-limits-on-the-claude-api", "canonical_source": "https://platform.claude.com/docs/en/api/rate-limits", "published_at": "2026-06-27 08:15:18+00:00", "updated_at": "2026-06-27 08:35:53.702251+00:00", "lang": "en", "topics": ["ai-products", "ai-tools", "large-language-models"], "entities": ["Anthropic", "Claude", "AWS", "Claude API", "Messages API"], "alternates": {"html": "https://wpnews.pro/news/higher-rate-limits-on-the-claude-api", "markdown": "https://wpnews.pro/news/higher-rate-limits-on-the-claude-api.md", "text": "https://wpnews.pro/news/higher-rate-limits-on-the-claude-api.txt", "jsonld": "https://wpnews.pro/news/higher-rate-limits-on-the-claude-api.jsonld"}}