Is it possible overload a AI as a Service with multiples requests ? A developer questions whether an AI-as-a-Service could be overloaded by multiple simultaneous requests generating large amounts of text, such as 10,000 Lorem Ipsum tokens, potentially causing a service outage. The developer notes that many infrastructures lack queuing methods for free chat services, raising concerns about cost and availability. I was thinking about some tests for a service that uses language models; there are several, even prompt injection. A question came to mind: is it possible to make multiple requests asking for any text like Lorem Ipsum, generating many unnecessary tokens and incurring costs? But creating a test where there are multiple accounts making the same request to generate 10,000 Lorem Ipsum tokens simultaneously, could that cause a service outage? Because most of the infrastructure I see doesn't use any queuing method when the chat is free of tasks involving an agent or even heavier functionalities. I didn't actually generate anything, I just wanted to start a discussion on this topic.