{"slug": "deepseek-v4-pro-now-available-on-together-ai", "title": "DeepSeek-V4 Pro now available on Together AI", "summary": "Together AI has launched DeepSeek-V4 Pro, a 1.6T-parameter Mixture-of-Experts model with a 512K-token context window, priced at $2.10 per million input tokens and $4.40 per million output tokens. The model supports three reasoning modes—Non-Think, Think High, and Think Max—enabling teams to match computational effort to task complexity for long-context workloads like code analysis and document review.", "body_md": "DeepSeek V4 Pro is now available on Together AI with a 512K-token context window for long-context reasoning workloads.**DeepSeek V4 Pro on Together AI:****Large-scale MoE architecture:** DeepSeek V4 Pro uses a 1.6T-parameter Mixture-of-Experts architecture with 49B activated parameters.**Controllable reasoning modes:** Non-Think, Think High, and Think Max let teams choose between fast responses, deeper reasoning, and maximum reasoning effort.-\n**Transparent serverless pricing:** DeepSeek V4 Pro is available at**\\$2.10 per 1M input tokens, \\$0.20 per 1M cached input tokens, and \\$4.40 per 1M output tokens.**\n\nLong-context reasoning changes what teams can ask a model to do. Entire repositories, large document sets, long agent traces, and tool outputs can fit into the model’s working context instead of being compressed into brittle summaries. But the models that can use that much context are also the hardest to serve: a 1.6T-parameter MoE with million-token context is not something most teams want to deploy, tune, and operate themselves.\n\nDeepSeek-V4 Pro is now available on Together AI, the AI Native Cloud, so teams can start with Serverless Inference at 512K context and move to dedicated infrastucture for full 1M context, reserved capacity, and production control. DeepSeek-V4 Flash is coming soon, giving teams another V4 option for workloads where speed and cost matter more than maximum reasoning depth.\n\n**At a glance**\n\n**Built for long-context reasoning**\n\nDeepSeek V4 Pro is built for workloads where the model needs to reason over more than a short prompt: large repositories, long technical documents, dense retrieval bundles, tool-call histories, and research corpora.\n\nDeepSeek V4 Pro supports million-token context at the model level; on Together AI, it is currently available with a 512K-token context window. That distinction matters because model capability and deployed serving profile are not always the same thing. Together AI is launching DeepSeek V4 Pro with a context window designed for reliable production serving, while still giving teams enough room for serious long-context workloads.\n\nThe architecture also matters because long context is not only a product spec. As context grows, serving cost, memory pressure, KV cache usage, latency, and concurrency all become part of the system design. DeepSeek V4 Pro uses hybrid attention, combining Compressed Sparse Attention and Heavily Compressed Attention, with DeepSeek reporting 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek V3.2 at million-token context.\n\n**Choose reasoning effort by workload**\n\nDeepSeek V4 Pro supports three reasoning modes, so teams can match reasoning depth to task difficulty instead of treating every request the same.\n\nA document assistant might use Non-Think for simple extraction, Think High for conflict analysis across policies, and Think Max only when the model needs to reason through a difficult decision. A code agent might use Think High for planning a migration and Think Max for debugging a subtle cross-service failure.\n\nDeepSeek reports benchmark results across coding, reasoning, long-context, and agentic tasks, including 93.5% LiveCodeBench, 90.1% GPQA Diamond, 80.6% SWE-bench Verified, 83.5% MRCR 1M, and 62.0% CorpusQA 1M.\n\n**Make repeated long-context queries cheaper with cached input pricing**\n\nLong-context systems often reuse the same large context across multiple questions: a repository snapshot, a document bundle, a policy archive, a retrieval payload, or a long agent trace. Cached input pricing makes those repeated workloads more practical.\n\nDeepSeek V4 Pro is priced at \\$2.10 / 1M input tokens, with cached input at \\$0.20 / 1M tokens and output at \\$4.40 / 1M tokens. That represents a **90% cost reduction** for reused context, which matters when the expensive part of the request is a stable block of text that gets reused across follow-up analysis.\n\n**Example pattern:**\n\n- Load a large stable context, such as a 300K-token repo summary, contract set, or policy archive.\n- Ask several follow-up questions over that same context.\n- Use cached input pricing where applicable to drastically reduce the cost of repeated analysis.\n\n**Workload patterns**\n\n**Code agents**\n\nUse DeepSeek V4 Pro when an agent needs to reason across repository slices, issue traces, internal documentation, prior tool calls, and proposed patches. Think High or Think Max is most useful for planning changes, debugging failures, or resolving cross-file dependencies.\n\n**Document intelligence**\n\nUse long context for contracts, policy sets, technical manuals, or research collections that need to be compared in one request. Non-Think can handle extraction and simple Q&A; Think High is better for conflict analysis, interpretation, and synthesis.\n\n**Long-context agent traces**\n\nUse DeepSeek V4 Pro to inspect long tool-call histories, intermediate results, and execution traces. Higher reasoning modes are most useful at decision points: when the agent needs to decide whether to continue, call another tool, revise a plan, or stop.\n\n**Research synthesis**\n\nUse DeepSeek V4 Pro for workflows that combine papers, notes, benchmark reports, retrieved documents, and prior analysis. Cached input pricing is especially useful when the same evidence set is reused across multiple questions.\n\n**Start serverless, move to reserved capacity**\n\nDeepSeek V4 Pro is available on Together AI Serverless Inference and Monthly Reserved infrastructure. Serverless is the right starting point for evaluation, development, and variable traffic. Monthly Reserved is better for steadier production demand where teams need more predictable capacity and cost control.\n\nFor long-context workloads, the deployment path matters. Teams are not only choosing a model; they are choosing how to manage throughput, concurrency, latency, KV cache pressure, and cost as context sizes grow. Together AI gives teams a path from evaluation to production without standing up the serving stack themselves.\n\n## Try it now\n\nDeepSeek-V4 Pro is available today on Together AI Serverless Inference and Dedicated Endpoints.\n\nStart with Serverless Inference for development and evaluation. For production workloads that require full 1M context, reserved capacity, workload isolation, or more predictable throughput, contact sales to deploy DeepSeek-V4 Pro on Together AI Dedicated Inference.\n\n## Get started\n\n→ Follow our [DeepSeek-V4 quickstart](https://docs.together.ai/docs/deepseek-v4-quickstart) to get up and running in minutes\n\n→ View the [DeepSeek-V4 Pro Model Page](https://www.together.ai/models/deepseek-v4-pro)\n\n→ Try [DeepSeek-V4 Pro in the Playground](https://api.together.ai/playground/v2/chat/deepseek-ai/DeepSeek-V4-Pro)\n\n→ [Contact Sales](https://www.together.ai/contact-sales) for Dedicated Inference deployment and volume pricing", "url": "https://wpnews.pro/news/deepseek-v4-pro-now-available-on-together-ai", "canonical_source": "https://www.together.ai/blog/deepseek-v4-pro-now-available-on-together-ai", "published_at": "2026-04-29 00:00:00+00:00", "updated_at": "2026-05-25 00:22:05.976512+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-products", "ai-infrastructure", "ai-tools"], "entities": ["DeepSeek", "DeepSeek-V4 Pro", "Together AI", "DeepSeek-V4 Flash"], "alternates": {"html": "https://wpnews.pro/news/deepseek-v4-pro-now-available-on-together-ai", "markdown": "https://wpnews.pro/news/deepseek-v4-pro-now-available-on-together-ai.md", "text": "https://wpnews.pro/news/deepseek-v4-pro-now-available-on-together-ai.txt", "jsonld": "https://wpnews.pro/news/deepseek-v4-pro-now-available-on-together-ai.jsonld"}}