cd /news/large-language-models/deepseek-s-response-api-isn-t-openai… · home topics large-language-models article
[ARTICLE · art-41528] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

DeepSeek's Response API Isn't OpenAI Responses. That One Parser Mistake Drops the Reasoning.

A developer warns that DeepSeek's response API is not compatible with OpenAI's Responses API, and that a common parsing mistake can silently drop the `reasoning_content` field. DeepSeek V4 models use `/chat/completions` with `reasoning_content` in the message object, and developers should explicitly parse this field to avoid losing thinking output. The developer also notes that old model names like `deepseek-chat` and `deepseek-reasoner` are deprecated in favor of `deepseek-v4-flash` and `deepseek-v4-pro`.

read5 min views1 publishedJun 27, 2026

I keep seeing developers use "DeepSeek response API" and "OpenAI Responses API" as if they mean the same thing.

They do not.

That small naming mistake can make your integration look like it works while quietly dropping the most important field in the response: reasoning_content

.

I spent time checking the DeepSeek V4 docs and the live TokenMix model catalog. The practical answer is simple:

DeepSeek is OpenAI-compatible at the Chat Completions layer. It is not documented as OpenAI /responses

compatible.

/responses

API. It is /chat/completions

.choices[0].message.reasoning_content

.message.content

, you may lose DeepSeek's thinking output.deepseek-v4-flash

and deepseek-v4-pro

; old deepseek-chat

and deepseek-reasoner

names are scheduled for deprecation.DeepSeek V4 moved the model naming story forward.

The old mental model was:

Old model name What people assumed
deepseek-chat
normal chat
deepseek-reasoner
reasoning model

The newer V4 model IDs are:

New model Best read
deepseek-v4-flash
cheaper/high-throughput V4
deepseek-v4-pro
stronger reasoning/coding V4

DeepSeek's docs say the older deepseek-chat

and deepseek-reasoner

names are compatibility aliases heading toward deprecation on 2026-07-24 15:59 UTC.

That means I would not build new production code around the old names.

If you are used to OpenAI Chat Completions, this will look familiar:

{
  "choices": [
    {
      "message": {
        "content": "final answer",
        "reasoning_content": "thinking output",
        "tool_calls": []
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 456,
    "completion_tokens_details": {
      "reasoning_tokens": 300
    }
  }
}

The trap is that most basic wrappers only do this:

answer = response.choices[0].message.content

That gets the final answer.

It does not get the thinking output.

For some products, that is fine. For debugging, evals, agent traces, and tool workflows, it is not fine.

I would parse DeepSeek responses explicitly:

def parse_deepseek_response(response):
    choice = response.choices[0]
    message = choice.message

    return {
        "answer": getattr(message, "content", None),
        "reasoning": getattr(message, "reasoning_content", None),
        "tool_calls": getattr(message, "tool_calls", None),
        "finish_reason": choice.finish_reason,
        "usage": getattr(response, "usage", None),
    }

That is not fancy. It is the minimum safe parser.

The point is not to show chain of thought to users. The point is to avoid silently losing fields that affect debugging, evals, and tool-call continuation.

This is the part I would not ignore.

DeepSeek's thinking-mode docs distinguish normal multi-turn chat from tool-call workflows.

For ordinary multi-turn conversations, you do not need to pass prior chain-of-thought content back.

But when tool calls are involved, DeepSeek says the intermediate reasoning_content

after a tool call must be passed back in the following request.

That means a generic OpenAI wrapper can fail in a very boring way:

reasoning_content

.role

and content

.That is the kind of bug that does not always crash. It just makes the agent worse.

Here is how I would decide what to implement:

def deepseek_integration_plan(app):
    if app["uses_old_model_names"]:
        return "Migrate from deepseek-chat/deepseek-reasoner to deepseek-v4-flash or deepseek-v4-pro."

    if app["uses_tools"] and app["thinking_enabled"]:
        return "Preserve reasoning_content across tool-call turns. Do not use a content-only wrapper."

    if app["needs_json"]:
        return "Use response_format={\"type\":\"json_object\"} and still validate the result."

    if app["high_volume"]:
        return "Start with deepseek-v4-flash and track cache hit/miss tokens."

    if app["hard_reasoning"]:
        return "Benchmark deepseek-v4-pro with reasoning enabled."

    return "Use Chat Completions compatibility, but parse DeepSeek-specific fields explicitly."

I like this tree because it avoids the biggest false choice.

The question is not "Is DeepSeek OpenAI-compatible?"

The question is "Which compatibility layer are you depending on?"

TokenMix exposes DeepSeek through an OpenAI-compatible base URL:

https://api.tokenmix.ai/v1

The live catalog currently lists:

Model Reasoning JSON Tools Streaming Prompt cache
deepseek/deepseek-v4-flash
yes yes yes yes yes
deepseek/deepseek-v4-pro
yes yes yes yes yes

That is useful because you can route DeepSeek alongside OpenAI, Claude, Gemini, Qwen, GLM, and other models through one endpoint.

But the same caveat remains:

OpenAI-compatible routing gets the request through.

Correct parsing still belongs to you.

The cost story is also easy to misunderstand.

DeepSeek direct pricing separates cache-hit input, cache-miss input, and output tokens.

TokenMix publishes catalog rates for routing through its endpoint.

For example, using the live TokenMix catalog rates I checked:

Model Input / 1M Output / 1M
DeepSeek V4 Flash $0.132353 $0.264706
DeepSeek V4 Pro $0.419118 $0.838235

So a 10M input / 2M output workload is roughly:

Flash = 10 * 0.132353 + 2 * 0.264706 = $1.85
Pro   = 10 * 0.419118 + 2 * 0.838235 = $5.87

That makes Flash the obvious first route for high-volume tasks.

I would only pay for Pro where Flash fails on your actual evals.

If I were shipping DeepSeek V4 this week, I would:

content

, reasoning_content

, tool_calls

, finish_reason

, and usage

.reasoning_content

in thinking-mode tool workflows.That last point matters.

One endpoint does not remove the need for fallback.

It just makes fallback less painful.

If you want DeepSeek, OpenAI, Claude, Gemini, Qwen, GLM and other models behind one OpenAI-compatible endpoint, that is roughly what TokenMix does. Disclosure: I work on the research side. Full cited breakdown is on the original article.

DeepSeek response compatibility is real, but it is not the OpenAI Responses API.

Treat it as Chat Completions compatibility plus DeepSeek-specific fields. Parse reasoning_content

intentionally, migrate to V4 model IDs, and do not let a generic wrapper quietly erase the data you need for reasoning, tools, and evals.

Have you seen OpenAI-compatible wrappers drop provider-specific fields like reasoning_content

or cache usage? How did you handle it?

── more in #large-language-models 4 stories · sorted by recency
── more on @deepseek 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/deepseek-s-response-…] indexed:0 read:5min 2026-06-27 ·