ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/DeepSeek/DeepSeek V4 Pro
D

DeepSeek V4 Pro

Input:$0.416/M
Output:$0.832/M
DeepSeek V4 Pro is a large-scale Mixture-of-Experts model from DeepSeek with 1.6T total parameters and 49B activated parameters, supporting a 1M-token context window. It is designed for advanced reasoning, coding, and long-horizon agent workflows, with strong performance across knowledge, math, and software engineering benchmarks.
New
Commercial Use
Playground
Overview
Features
Pricing
API
Versions

Technical specifications

ItemDeepSeek-V4-Pro
ProviderDeepSeek
API model namedeepseek-v4-pro
Base URLshttps://api.deepseek.com and https://api.deepseek.com/anthropic
Input typeText
Output typeText, tool calls, reasoning output
Context length1,000,000 tokens
Max output384,000 tokens
Reasoning modesNon-thinking, thinking (default)
Agent/coding defaultsreasoning_effort can be set as high; complex agent requests may use max
Supported featuresJSON Output, Tool Calls, Chat Prefix Completion (beta), FIM Completion (beta in non-thinking mode)
Local/open-weights release1.6T total parameters, 49B activated parameters, FP4 + FP8 mixed precision
License (model card)MIT
Reference model cardDeepSeek-V4-Pro preview on Hugging Face

What is DeepSeek-V4-Pro?

DeepSeek-V4-Pro is the stronger member of DeepSeek’s V4 preview family. The official model card describes it as a 1.6T-parameter MoE model with 49B activated parameters and a one-million-token context window, aimed at long-horizon knowledge work, code generation, and agent tasks. The API docs expose it through the standard DeepSeek chat-completions surface and support both OpenAI and Anthropic SDK styles.

Main features

  • Million-token context: DeepSeek documents a 1M-token context length, which makes the model suitable for very large document sets, repositories, and multi-step agent sessions.
  • Two reasoning modes: The API supports non-thinking and thinking modes; thinking is the default, and the docs note that complex agent requests such as Claude Code or OpenCode may automatically use max effort.
  • Tool-call capable: DeepSeek’s thinking mode supports tool calls, which is important for agents that need search, file operations, or external functions.
  • Long-context efficiency: The model card says V4 uses a hybrid attention design with Compressed Sparse Attention and Heavily Compressed Attention to reduce long-context compute and KV cache cost relative to V3.2. citeturn980363view2
  • Coding and reasoning focus: DeepSeek says the V4-Pro-Max reasoning mode advances coding benchmarks and closes much of the gap with leading closed-source models on reasoning and agentic tasks. citeturn980363view2
  • SDK flexibility: It can be accessed through standard OpenAI-compatible chat completions or via DeepSeek’s Anthropic-compatible endpoint for tool-oriented workflows.

Benchmark performance

The official DeepSeek model card reports the following evaluation results for the base model family and for the V4-Pro-Max comparison set. In the base-model table, V4-Pro scores higher than V3.2-Base on several knowledge and long-context benchmarks, including MMLU-Pro (73.5 vs. 65.5), FACTS Parametric (62.6 vs. 27.1), and LongBench-V2 (51.5 vs. 40.2).

BenchmarkV3.2-BaseV4-Flash-BaseV4-Pro-Base
MMLU-Pro (EM)65.568.373.5
FACTS Parametric (EM)27.133.962.6
HumanEval (Pass@1)62.869.576.8
LongBench-V2 (EM)40.244.751.5

The same model card also shows V4-Pro-Max remaining competitive with top frontier models on selected tasks. For example, it posts 87.5 on MMLU-Pro, 57.9 on SimpleQA-Verified, 90.1 on GPQA Diamond, and 67.9 on Terminal Bench 2.0 in the published comparison table.

DeepSeek-V4-Pro vs DeepSeek-V4-Flash vs DeepSeek-V3.2

ModelBest fitContextNotes
DeepSeek-V4-ProHeavy reasoning, coding, agents, large documents1MLargest V4 model, 49B activated parameters, strongest overall capacity in the series. citeturn980363view2turn980363view0
DeepSeek-V4-FlashFaster, lighter general use1MSmaller 284B/13B model, still supports thinking and tool calls. citeturn980363view2turn980363view0
DeepSeek-V3.2Previous-generation long-context baseline128K in earlier API docs; V4 uses a different 1M context designUseful as a reference point for efficiency gains; V4-Pro’s model card reports large reductions in long-context FLOPs and KV cache versus V3.2. citeturn321011view1turn980363view2

Best use cases

  • Repository-scale coding assistants and refactoring tools
  • Long-document analysis and synthesis
  • Tool-using agents that need multi-turn reasoning
  • Technical support workflows that benefit from long memory and structured outputs
  • Chinese and multilingual knowledge tasks where the model card shows strong benchmark performance

How to access and use Deepseek v4 pro API

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to Deepseek v4 proAPI

Select the “deepseek-v4-pro” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it:  Anthropic Messages format and Chat format.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.Enable features such as streaming, prompt caching, or long-context handling via standard parameters.

FAQ

Can DeepSeek-V4-Pro handle 1M-token documents in the API?

Yes. DeepSeek-V4-Pro with a 1M-token context length and up to 384K output tokens, so it is built for very long documents and multi-file workflows.

Does DeepSeek-V4-Pro support thinking mode and tool calls?

Yes. DeepSeek-V4-Pro supports both thinking and non-thinking modes, plus JSON output and tool calls.

When should I use DeepSeek-V4-Pro instead of DeepSeek-V4-Flash?

Use DeepSeek-V4-Pro when accuracy and agentic coding matter more than speed. DeepSeek says V4-Flash is the faster, more economical option, while V4-Pro is stronger on coding and broader agent evaluations.

Is DeepSeek-V4-Pro good for coding agents like Claude Code or OpenCode?

Yes. DeepSeek-V4-Pro configured for Claude Code and OpenCode, with reasoningEffort set to max and thinking enabled.

How do I integrate DeepSeek-V4-Pro with OpenAI-compatible SDKs?

Use the CometAPI base URL https://api.cometapi.com with the model name deepseek-v4-pro

Is DeepSeek-V4-Pro suitable for search-heavy research workflows?

Yes. V4-Pro performs strongly on search and retrieval-style tasks, and it outperforms DeepSeek-V3.2 by a substantial margin in both objective and subjective Q&A categories.

Features for DeepSeek V4 Pro

Explore the key features of DeepSeek V4 Pro, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for DeepSeek V4 Pro

Explore competitive pricing for DeepSeek V4 Pro, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how DeepSeek V4 Pro can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.416/M
Output:$0.832/M
Input:$0.52/M
Output:$1.04/M
-20%

Sample code and API for DeepSeek V4 Pro

Access comprehensive sample code and API resources for DeepSeek V4 Pro to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of DeepSeek V4 Pro in your projects.
POST
/v1/chat/completions
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."},
    ],
    stream=True,
    max_tokens=256,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

thinking = False
for chunk in stream:
    delta = chunk.choices[0].delta
    reasoning = (delta.model_extra or {}).get("reasoning_content") or ""
    content = delta.content or ""

    if reasoning:
        if not thinking:
            print("<reasoning>")
            thinking = True
        print(reasoning, end="", flush=True)

    if content:
        if thinking:
            print("
</reasoning>

<answer>")
            thinking = False
        print(content, end="", flush=True)

print()

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."},
    ],
    stream=True,
    max_tokens=256,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

thinking = False
for chunk in stream:
    delta = chunk.choices[0].delta
    reasoning = (delta.model_extra or {}).get("reasoning_content") or ""
    content = delta.content or ""

    if reasoning:
        if not thinking:
            print("<reasoning>")
            thinking = True
        print(reasoning, end="", flush=True)

    if content:
        if thinking:
            print("\n</reasoning>\n\n<answer>")
            thinking = False
        print(content, end="", flush=True)

print()

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1";

const client = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const stream = await client.chat.completions.create({
  model: "deepseek-v4-pro",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Which number is greater, 9.11 or 9.8? Answer with one sentence." },
  ],
  thinking: { type: "enabled" },
  reasoning_effort: "high",
  max_tokens: 256,
  stream: true,
});

let thinking = false;
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta ?? {};
  const reasoning = delta.reasoning_content ?? "";
  const content = delta.content ?? "";

  if (reasoning) {
    if (!thinking) {
      process.stdout.write("<reasoning>\n");
      thinking = true;
    }
    process.stdout.write(reasoning);
  }

  if (content) {
    if (thinking) {
      process.stdout.write("\n</reasoning>\n\n<answer>\n");
      thinking = false;
    }
    process.stdout.write(content);
  }
}

process.stdout.write("\n");

Curl Code Example

#!/usr/bin/env bash
# Get your CometAPI key from https://www.cometapi.com/console/token
# Export it as: export COMETAPI_KEY="your-key-here"

if ! command -v jq >/dev/null 2>&1; then
  echo "jq is required to parse streamed reasoning_content in this shell example." >&2
  exit 1
fi

thinking=false

curl --silent --no-buffer --location --request POST "https://api.cometapi.com/v1/chat/completions" \
  --header "Authorization: Bearer $COMETAPI_KEY" \
  --header "Content-Type: application/json" \
  --data-raw '{
    "model": "deepseek-v4-pro",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Which number is greater, 9.11 or 9.8? Answer with one sentence."}
    ],
    "thinking": {"type": "enabled"},
    "reasoning_effort": "high",
    "max_tokens": 256,
    "stream": true
  }' | while IFS= read -r line; do
    case "$line" in
      data:\ *) data=${line#data: } ;;
      *) continue ;;
    esac

    [ "$data" = "[DONE]" ] && break

    reasoning=$(printf '%s' "$data" | jq -r '.choices[0].delta.reasoning_content // empty')
    content=$(printf '%s' "$data" | jq -r '.choices[0].delta.content // empty')

    if [ -n "$reasoning" ]; then
      if [ "$thinking" = false ]; then
        printf '<reasoning>\n'
        thinking=true
      fi
      printf '%s' "$reasoning"
    fi

    if [ -n "$content" ]; then
      if [ "$thinking" = true ]; then
        printf '\n</reasoning>\n\n<answer>\n'
        thinking=false
      fi
      printf '%s' "$content"
    fi
  done

printf '\n'

Versions of DeepSeek V4 Pro

The reason DeepSeek V4 Pro has multiple snapshots may include potential factors such as variations in output after updates requiring older snapshots for consistency, providing developers a transition period for adaptation and migration, and different snapshots corresponding to global or regional endpoints to optimize user experience. For detailed differences between versions, please refer to the official documentation.
version
deepseek-v4-pro

More Models

C

Claude Opus 4.7

Input:$3/M
Output:$15/M
Claude Opus 4.7 is a hybrid reasoning model designed specifically for frontier-level coding, AI agents, and complex multi-step professional work. Unlike lighter models (e.g., Sonnet or Haiku variants), Opus 4.7 prioritizes depth, consistency, and autonomy on the hardest tasks.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
O

GPT 5.5 Pro

Input:$24/M
Output:$144/M
An advanced model engineered for extremely complex logic and professional demands, representing the highest standard of deep reasoning and precise analytical capabilities.
O

GPT 5.5

Input:$4/M
Output:$24/M
A next-generation multimodal flagship model balancing exceptional performance with efficient response, dedicated to providing comprehensive and stable general-purpose AI services.
O

GPT Image 2 ALL

Per Request:$0.04
GPT Image 2 is openai state-of-the-art image generation model for fast, high-quality image generation and editing. It supports flexible image sizes and high-fidelity image inputs.
O

GPT 5.5 ALL

Input:$4/M
Output:$24/M
GPT-5.5 excels in code writing, online research, data analysis, and cross-tool operations. The model not only improves its autonomy in handling complex multi-step tasks but also significantly improves reasoning capabilities and execution efficiency while maintaining the same latency as its predecessor, marking an important step towards automated office automation in AI.

Related Blog

How to Run DeepSeek V4 Locally
Apr 30, 2026
deepseek-v4

How to Run DeepSeek V4 Locally

The practical way to run DeepSeek V4 locally is to use the official open-source weights with a high-performance serving stack such as vLLM, then expose the model through a local OpenAI-compatible endpoint. DeepSeek’s current public materials describe two models in the V4 family: DeepSeek-V4-Pro at 1.6T total parameters / 49B active, and DeepSeek-V4-Flash at 284B total parameters / 13B active, both with 1M-token context and three reasoning modes. vLLM’s current local deployment examples target 8× B200/B300 for Pro and 4× B200/B300 for Flash. If you do not have that kind of hardware, a hosted fallback like CometAPI is the more practical path.
How to Use Deepseek V4 API
Apr 24, 2026
deepseek-v4

How to Use Deepseek V4 API

For developers, that combination matters for one simple reason: it lowers migration friction while raising the ceiling on what you can build. You are not learning a brand-new API shape. You are updating the model name, keeping the base URL, and shipping against a larger context window with newer reasoning behavior. DeepSeek’s official docs explicitly say to keep the base URL and change the model parameter to deepseek-v4-pro or deepseek-v4-flash.
Deepseek v4 released: What is and How to Access
Apr 24, 2026
deepseek-v4

Deepseek v4 released: What is and How to Access

DeepSeek-V4 is DeepSeek’s new preview flagship model family, officially launched on April 24, 2026. It includes DeepSeek-V4-Pro and DeepSeek-V4-Flash, both of which support 1 million tokens of context, expose OpenAI-compatible and Anthropic-compatible APIs, and are available on DeepSeek’s app, mobile app, and CometAPI's API. In practical terms, Pro is the higher-capability choice for difficult reasoning and agentic coding, while Flash is the faster, more economical option for high-throughput workloads.
DeepSeek v4 is now available on the web: How to access and test it
Apr 9, 2026
deepseek-v4

DeepSeek v4 is now available on the web: How to access and test it

DeepSeek V4 gray-scale test has leaked and is actively rolling out in limited form on the official web platform. Select users now see a redesigned interface with **Fast Mode** (default, high-speed daily use), **Expert Mode** (deep reasoning and complex problem-solving), and **Vision Mode** (multimodal image and video handling). This marks the most significant update since DeepSeek-V3.2, with rumored 1 million token context windows, updated knowledge bases, native multimodal capabilities, and a new underlying architecture optimized for speed, logic, and efficiency.
DeepSeek Update: what changed, what’s new, and why it matters
Feb 15, 2026
deepseek
deepseek-v4

DeepSeek Update: what changed, what’s new, and why it matters

In February 2026, Chinese AI startup DeepSeek rolled out a significant update to its online application and web interface, signaling momentum toward its next-generation model release, DeepSeek V4. While the update comes ahead of the full V4 model, it has already sparked conversation among users and industry watchers for its changes to interaction behavior, long-context capabilities, and preparatory testing for future potential.