Technical specifications
| Item | DeepSeek-V4-Pro |
|---|---|
| Provider | DeepSeek |
| API model name | deepseek-v4-pro |
| Base URLs | https://api.deepseek.com and https://api.deepseek.com/anthropic |
| Input type | Text |
| Output type | Text, tool calls, reasoning output |
| Context length | 1,000,000 tokens |
| Max output | 384,000 tokens |
| Reasoning modes | Non-thinking, thinking (default) |
| Agent/coding defaults | reasoning_effort can be set as high; complex agent requests may use max |
| Supported features | JSON Output, Tool Calls, Chat Prefix Completion (beta), FIM Completion (beta in non-thinking mode) |
| Local/open-weights release | 1.6T total parameters, 49B activated parameters, FP4 + FP8 mixed precision |
| License (model card) | MIT |
| Reference model card | DeepSeek-V4-Pro preview on Hugging Face |
What is DeepSeek-V4-Pro?
DeepSeek-V4-Pro is the stronger member of DeepSeek’s V4 preview family. The official model card describes it as a 1.6T-parameter MoE model with 49B activated parameters and a one-million-token context window, aimed at long-horizon knowledge work, code generation, and agent tasks. The API docs expose it through the standard DeepSeek chat-completions surface and support both OpenAI and Anthropic SDK styles.
Main features
- Million-token context: DeepSeek documents a 1M-token context length, which makes the model suitable for very large document sets, repositories, and multi-step agent sessions.
- Two reasoning modes: The API supports non-thinking and thinking modes; thinking is the default, and the docs note that complex agent requests such as Claude Code or OpenCode may automatically use
maxeffort. - Tool-call capable: DeepSeek’s thinking mode supports tool calls, which is important for agents that need search, file operations, or external functions.
- Long-context efficiency: The model card says V4 uses a hybrid attention design with Compressed Sparse Attention and Heavily Compressed Attention to reduce long-context compute and KV cache cost relative to V3.2. citeturn980363view2
- Coding and reasoning focus: DeepSeek says the V4-Pro-Max reasoning mode advances coding benchmarks and closes much of the gap with leading closed-source models on reasoning and agentic tasks. citeturn980363view2
- SDK flexibility: It can be accessed through standard OpenAI-compatible chat completions or via DeepSeek’s Anthropic-compatible endpoint for tool-oriented workflows.
Benchmark performance
The official DeepSeek model card reports the following evaluation results for the base model family and for the V4-Pro-Max comparison set. In the base-model table, V4-Pro scores higher than V3.2-Base on several knowledge and long-context benchmarks, including MMLU-Pro (73.5 vs. 65.5), FACTS Parametric (62.6 vs. 27.1), and LongBench-V2 (51.5 vs. 40.2).
| Benchmark | V3.2-Base | V4-Flash-Base | V4-Pro-Base |
|---|---|---|---|
| MMLU-Pro (EM) | 65.5 | 68.3 | 73.5 |
| FACTS Parametric (EM) | 27.1 | 33.9 | 62.6 |
| HumanEval (Pass@1) | 62.8 | 69.5 | 76.8 |
| LongBench-V2 (EM) | 40.2 | 44.7 | 51.5 |
The same model card also shows V4-Pro-Max remaining competitive with top frontier models on selected tasks. For example, it posts 87.5 on MMLU-Pro, 57.9 on SimpleQA-Verified, 90.1 on GPQA Diamond, and 67.9 on Terminal Bench 2.0 in the published comparison table.
DeepSeek-V4-Pro vs DeepSeek-V4-Flash vs DeepSeek-V3.2
| Model | Best fit | Context | Notes |
|---|---|---|---|
| DeepSeek-V4-Pro | Heavy reasoning, coding, agents, large documents | 1M | Largest V4 model, 49B activated parameters, strongest overall capacity in the series. citeturn980363view2turn980363view0 |
| DeepSeek-V4-Flash | Faster, lighter general use | 1M | Smaller 284B/13B model, still supports thinking and tool calls. citeturn980363view2turn980363view0 |
| DeepSeek-V3.2 | Previous-generation long-context baseline | 128K in earlier API docs; V4 uses a different 1M context design | Useful as a reference point for efficiency gains; V4-Pro’s model card reports large reductions in long-context FLOPs and KV cache versus V3.2. citeturn321011view1turn980363view2 |
Best use cases
- Repository-scale coding assistants and refactoring tools
- Long-document analysis and synthesis
- Tool-using agents that need multi-turn reasoning
- Technical support workflows that benefit from long memory and structured outputs
- Chinese and multilingual knowledge tasks where the model card shows strong benchmark performance
How to access and use Deepseek v4 pro API
Step 1: Sign Up for API Key
Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.
Step 2: Send Requests to Deepseek v4 proAPI
Select the “deepseek-v4-pro” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it: Anthropic Messages format and Chat format.
Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.
Step 3: Retrieve and Verify Results
Process the API response to get the generated answer. After processing, the API responds with the task status and output data.Enable features such as streaming, prompt caching, or long-context handling via standard parameters.


.webp)
