{"slug": "a-flat-per-call-endpoint-for-summarize-classify-extract-in-your-n8n-and-make", "title": "A flat per-call endpoint for summarize / classify / extract in your n8n and Make automations", "summary": "A developer introduced Modelis, an OpenAI-compatible gateway that charges a flat per-call price for bounded-output tasks like summarization, classification, and extraction, making it suitable for high-volume automations in n8n and Make. The service caps output at ~1024 tokens and auto-routes requests to a fitting model, ensuring predictable costs. The developer also released an open-source adapter for local use.", "body_md": "If you run automations that summarize, classify, or pull fields out of text at volume, the LLM step is where per-token pricing turns budgeting into a guessing game: one batch of long inputs and the bill spikes. For these **bounded-output jobs**, a flat price per call fits better than a per-token frontier model. Here is how I wire it into n8n / Make, and when not to.\n\nAutomation runs are repetitive and high-volume, and the outputs are short by nature: a summary, a label, a few extracted fields. I route them through [Modelis](https://rapidapi.com/chenxiao5580/api/modelis-auto-chat), an OpenAI-compatible gateway that auto-routes each request to a fitting model and charges a **flat price per call** with output capped at ~1024 tokens. Because the output is bounded, each run costs the same and your monthly total stays predictable no matter the input size.\n\nIt is a standard OpenAI-compatible `POST /v1/chat/completions`\n\n. Use an HTTP Request node:\n\n`POST`\n\n`https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions`\n\n`x-rapidapi-host: modelis-auto-chat.p.rapidapi.com`\n\n, `x-rapidapi-key: YOUR_KEY`\n\n, `content-type: application/json`\n\n```\n{\"model\":\"modelis-auto\",\"messages\":[{\"role\":\"user\",\"content\":\"Label sentiment (positive/negative/neutral): {{ $json.text }}\"}]}\n```\n\nThe curl equivalent:\n\n```\ncurl --request POST \\\n  --url https://modelis-auto-chat.p.rapidapi.com/v1/chat/completions \\\n  --header 'content-type: application/json' \\\n  --header 'x-rapidapi-host: modelis-auto-chat.p.rapidapi.com' \\\n  --header 'x-rapidapi-key: YOUR_KEY' \\\n  --data '{\"model\":\"modelis-auto\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarize in 2 sentences: ...\"}]}'\n```\n\nIf you would rather use a built-in OpenAI node that expects an `Authorization: Bearer`\n\nkey and a custom base URL, run the tiny open-source adapter next to your workflow runner:\n\n```\nnpx modelis-openai      # local proxy on 127.0.0.1:8787, MIT, ~120 lines\n```\n\nThen point the node at `http://127.0.0.1:8787/v1`\n\nwith model `modelis-auto`\n\n.\n\n`Summarize in 2 sentences: ...`\n\n`Label sentiment (positive/negative/neutral): ...`\n\n`Return JSON with {name, email, company} from: ...`\n\nAll produce short outputs, so the flat per-call price keeps high-volume runs cheap to reason about.\n\nLong-form generation (articles, whole files, large code) will hit the ~1024-token cap and get truncated. Keep a high-output model for those. Use this for the short, structured outputs that automations actually need.\n\nI built the adapter. I am most curious which extraction and classification tasks the routing handles well versus badly. If you point an automation at it, I would love to hear how it routed.", "url": "https://wpnews.pro/news/a-flat-per-call-endpoint-for-summarize-classify-extract-in-your-n8n-and-make", "canonical_source": "https://dev.to/chenxiao5580cmd/a-flat-per-call-endpoint-for-summarize-classify-extract-in-your-n8n-and-make-automations-548h", "published_at": "2026-06-28 13:00:34+00:00", "updated_at": "2026-06-28 13:33:44.375148+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "large-language-models", "ai-products", "ai-infrastructure"], "entities": ["Modelis", "RapidAPI", "n8n", "Make", "OpenAI"], "alternates": {"html": "https://wpnews.pro/news/a-flat-per-call-endpoint-for-summarize-classify-extract-in-your-n8n-and-make", "markdown": "https://wpnews.pro/news/a-flat-per-call-endpoint-for-summarize-classify-extract-in-your-n8n-and-make.md", "text": "https://wpnews.pro/news/a-flat-per-call-endpoint-for-summarize-classify-extract-in-your-n8n-and-make.txt", "jsonld": "https://wpnews.pro/news/a-flat-per-call-endpoint-for-summarize-classify-extract-in-your-n8n-and-make.jsonld"}}