Skip to content

[Bug/Question] Excessive Input Token Usage When Invoking Skills - Rate Limit Errors #324

@alfredcs

Description

@alfredcs

Affected Notebook/File

03_skills_custom_development.ipynb

Bug Description

When invoking skills/tools through the Claude agent, the input token consumption appears to be excessively high, causing frequent rate limit errors even for relatively simple operations.

RateLimitError: Error code: 429 - {
  'type': 'error', 
  'error': {
    'type': 'rate_limit_error', 
    'message': 'This request would exceed the rate limit for your organization of 50,000 input tokens per minute. Please reduce the prompt length or the maximum tokens requested, or try again later.'
  }, 
  'request_id': 'req_011CVwgfm56xSnTkeud2yzZX'
}

I'd like to understand and discuss potential strategies to reduce input token consumption:

  1. System Prompt Optimization
  • Is there a recommended way to minimize system prompt size when using agent skills?
  • Can skill definitions be loaded dynamically rather than included in every request?
  1. Conversation History Management
  • What's the recommended approach for truncating or summarizing conversation history?
    Is there a sliding window implementation available?
  1. Tool/Skill Definition Efficiency
  • Are there best practices for defining tools with minimal token overhead?
  • Can tool schemas be compressed or cached?
  1. Caching Mechanisms
  • Does the API support prompt caching to reduce repeated token charges?
  • Are there plans to implement context caching for agent interactions?
  1. Token Budgeting
  • Is there a way to set a maximum input token budget per request?
  • Can we get token count estimates before sending requests?

Steps to Reproduce

Error appear on : Test Brand Guidelines with Document Creation
Let's test the brand skill by creating a branded PowerPoint presentation:

Error Message

RateLimitError: Error code: 429 - {
  'type': 'error', 
  'error': {
    'type': 'rate_limit_error', 
    'message': 'This request would exceed the rate limit for your organization of 50,000 input tokens per minute. Please reduce the prompt length or the maximum tokens requested, or try again later.'
  }, 
  'request_id': 'req_011CVwgfm56xSnTkeud2yzZX'
}

Environment

No response

Would you be willing to submit a PR to fix this?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions