ModelsPricingEnterprise
500+ AI Model API, All In One API.Just In CometAPI
Models API
Developer
Quick StartDocumentationAPI Dashboard
Company
About usEnterprise
Resources
AI ModelsBlogChangelogSupport
Terms of ServicePrivacy Policy
© 2026 CometAPI · All rights reserved
Home/Models/DeepSeek/DeepSeek V4 Flash
D

DeepSeek V4 Flash

Input:$0.12/M
Output:$0.24/M
DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model from DeepSeek with 284B total parameters and 13B activated parameters, supporting a 1M-token context window. It is designed for fast inference and high-throughput workloads, while maintaining strong reasoning and coding performance.
New
Commercial Use
Playground
Overview
Features
Pricing
API

Technical specifications of DeepSeek-V4-Flash

ItemDetails
ModelDeepSeek-V4-Flash
ProviderDeepSeek
FamilyDeepSeek-V4 preview series
ArchitectureMixture-of-Experts (MoE)
Total parameters284B
Activated parameters13B
Context length1,000,000 tokens
PrecisionFP4 + FP8 mixed
Reasoning modesNon-think, Think, Think Max
Release statusPreview model
LicenseMIT License

What is DeepSeek-V4-Flash?

DeepSeek-V4-Flash is DeepSeek’s efficiency-focused preview model in the V4 series. It is built as a Mixture-of-Experts language model with a relatively small active footprint for its size, which helps it stay responsive while still supporting a very large 1M-token context window.

Main features of DeepSeek-V4-Flash

  • Million-token context: The model supports a 1,000,000-token context window, which makes it suitable for very long documents, large codebases, and multi-step agent sessions.
  • Efficiency-first MoE design: It uses 284B total parameters but only 13B activated parameters per request, a setup aimed at faster and more efficient inference.
  • Three reasoning modes: Non-think, Think, and Think Max let you trade speed for deeper reasoning when the task gets harder.
  • Strong long-context architecture: DeepSeek says the V4 series combines Compressed Sparse Attention and Heavily Compressed Attention to improve long-context efficiency.
  • Competitive coding and agent behavior: The model card reports strong results on coding and agentic benchmarks, including HumanEval, SWE Verified, Terminal Bench 2.0, and BrowseComp.
  • Open weights and local deployment: The release includes model weights, local inference guidance, and an MIT License, which makes self-hosting and experimentation practical.

Benchmark performance of DeepSeek-V4-Flash

Selected results from the official model card show that DeepSeek-V4-Flash improves over DeepSeek-V3.2-Base on several core benchmarks:

BenchmarkDeepSeek-V3.2-BaseDeepSeek-V4-Flash-BaseDeepSeek-V4-Pro-Base
AGIEval (EM)80.182.683.1
MMLU (EM)87.888.790.1
MMLU-Pro (EM)65.568.373.5
HumanEval (Pass@1)62.869.576.8
LongBench-V2 (EM)40.244.751.5

In the reasoning-and-agent table, the Flash variant also posts solid results on terminal and software tasks, with Flash Max reaching 56.9 on Terminal Bench 2.0 and 79.0 on SWE Verified, while still trailing the larger Pro model on the hardest knowledge-heavy and agentic tasks.

DeepSeek-V4-Flash vs DeepSeek-V4-Pro vs DeepSeek-V3.2

ModelBest fitTradeoff
DeepSeek-V4-FlashFast, long-context work, coding assistants, and high-throughput agent flowsSlightly behind Pro on pure knowledge and the most complex agentic tasks
DeepSeek-V4-ProHighest-capability tasks, deeper reasoning, and harder agent workflowsHeavier and less efficiency-oriented than Flash
DeepSeek-V3.2Older baseline for comparison and migration planningLower benchmark performance than V4-Flash on the official tables

Typical use cases for DeepSeek-V4-Flash

  1. Long-document analysis for contracts, research packs, support knowledge bases, and internal wikis.
  2. Coding assistants that need to inspect big repos, follow instructions across many files, and keep context alive.
  3. Agent workflows where the model needs to reason, call tools, and iterate without losing the thread.
  4. Enterprise chat systems that benefit from a very large context window and low-friction deployment.
  5. Prototype local deployments for teams that want to evaluate DeepSeek-V4 behavior before production hardening.

How to access and use Deepseek v4 Flash API

Step 1: Sign Up for API Key

Log in to cometapi.com. If you are not our user yet, please register first. Sign into your CometAPI console. Get the access credential API key of the interface. Click “Add Token” at the API token in the personal center, get the token key: sk-xxxxx and submit.

Step 2: Send Requests to deepseek v4 flash API

Select the “deepseek-v4-flash” endpoint to send the API request and set the request body. The request method and request body are obtained from our website API doc. Our website also provides Apifox test for your convenience. Replace <YOUR_API_KEY> with your actual CometAPI key from your account. Where to call it:  Anthropic Messages format and Chat format.

Insert your question or request into the content field—this is what the model will respond to . Process the API response to get the generated answer.

Step 3: Retrieve and Verify Results

Process the API response to get the generated answer. After processing, the API responds with the task status and output data.Enable features such as streaming, prompt caching, or long-context handling via standard parameters.

FAQ

Can DeepSeek-V4-Flash API handle 1M-token prompts?

Yes. DeepSeek-V4-Flash with a 1M-token context length, so it is built for very long prompts, documents, and codebases.

Does DeepSeek-V4-Flash API support thinking mode and non-thinking mode?

Yes. DeepSeek-V4-Flash supports both non-thinking and thinking modes, with thinking enabled by default.

Does DeepSeek-V4-Flash API support JSON output and tool calls?

Yes. DeepSeek lists both JSON Output and Tool Calls as supported features for DeepSeek-V4-Flash.

When should I use DeepSeek-V4-Flash API instead of DeepSeek-V4-Pro?

Use V4-Flash when you want the V4-series context window and agent features but do not need the larger Pro model. The official report shows V4-Pro is stronger on several knowledge-heavy benchmarks, so Pro is the better fit for maximum capability.

How do I integrate DeepSeek-V4-Flash API with OpenAI SDKs via CometAPI?

Use the OpenAI-compatible base URL https://api.cometapi.com and set the model to deepseek-v4-flash. DeepSeek also documents an Anthropic-compatible endpoint, so you can reuse common OpenAI/Anthropic SDK patterns with the same API surface.

Is DeepSeek-V4-Flash API suitable for coding agents like Claude Code or OpenCode?

Yes, and the V4 family is designed for the same agent-style API surface and reasoning controls.

What are DeepSeek-V4-Flash API's known limitations?

It is smaller than DeepSeek-V4-Pro, so it trails Pro on some knowledge-heavy and complex agentic tasks. DeepSeek also labels the V4 series as a preview release, so teams should test it on their own workloads.

Features for DeepSeek V4 Flash

Explore the key features of DeepSeek V4 Flash, designed to enhance performance and usability. Discover how these capabilities can benefit your projects and improve user experience.

Pricing for DeepSeek V4 Flash

Explore competitive pricing for DeepSeek V4 Flash, designed to fit various budgets and usage needs. Our flexible plans ensure you only pay for what you use, making it easy to scale as your requirements grow. Discover how DeepSeek V4 Flash can enhance your projects while keeping costs manageable.
Comet Price (USD / M Tokens)Official Price (USD / M Tokens)Discount
Input:$0.12/M
Output:$0.24/M
Input:$0.15/M
Output:$0.3/M
-20%

Sample code and API for DeepSeek V4 Flash

Access comprehensive sample code and API resources for DeepSeek V4 Flash to streamline your integration process. Our detailed documentation provides step-by-step guidance, helping you leverage the full potential of DeepSeek V4 Flash in your projects.
POST
/v1/chat/completions
Python
JavaScript
Curl
from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
    stream=False,
    extra_body={
        "thinking": {"type": "enabled"},
        "reasoning_effort": "high",
    },
)

print(completion.choices[0].message.content)

Python Code Example

from openai import OpenAI
import os

# Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"
BASE_URL = "https://api.cometapi.com/v1"

client = OpenAI(base_url=BASE_URL, api_key=COMETAPI_KEY)

completion = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
    stream=False,
    extra_body={
        "thinking": {"type": "enabled"},
        "reasoning_effort": "high",
    },
)

print(completion.choices[0].message.content)

JavaScript Code Example

import OpenAI from "openai";

// Get your CometAPI key from https://www.cometapi.com/console/token, and paste it here
const api_key = process.env.COMETAPI_KEY || "<YOUR_COMETAPI_KEY>";
const base_url = "https://api.cometapi.com/v1";

const client = new OpenAI({
  apiKey: api_key,
  baseURL: base_url,
});

const completion = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [
    { role: "system", content: "You are a helpful assistant." },
    { role: "user", content: "Hello!" },
  ],
  thinking: { type: "enabled" },
  reasoning_effort: "high",
  stream: false,
});

console.log(completion.choices[0].message.content);

Curl Code Example

curl https://api.cometapi.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Hello!"
      }
    ],
    "thinking": {
      "type": "enabled"
    },
    "reasoning_effort": "high",
    "stream": false
  }'

More Models

C

Claude Opus 4.7

Input:$3/M
Output:$15/M
Claude Opus 4.7 is a hybrid reasoning model designed specifically for frontier-level coding, AI agents, and complex multi-step professional work. Unlike lighter models (e.g., Sonnet or Haiku variants), Opus 4.7 prioritizes depth, consistency, and autonomy on the hardest tasks.
A

Claude Sonnet 4.6

Input:$2.4/M
Output:$12/M
Claude Sonnet 4.6 is our most capable Sonnet model yet. It’s a full upgrade of the model’s skills across coding, computer use, long-context reasoning, agent planning, knowledge work, and design. Sonnet 4.6 also features a 1M token context window in beta.
O

GPT 5.5 Pro

Input:$24/M
Output:$144/M
An advanced model engineered for extremely complex logic and professional demands, representing the highest standard of deep reasoning and precise analytical capabilities.
O

GPT 5.5

Input:$4/M
Output:$24/M
A next-generation multimodal flagship model balancing exceptional performance with efficient response, dedicated to providing comprehensive and stable general-purpose AI services.
O

GPT Image 2 ALL

Per Request:$0.04
GPT Image 2 is openai state-of-the-art image generation model for fast, high-quality image generation and editing. It supports flexible image sizes and high-fidelity image inputs.
O

GPT 5.5 ALL

Input:$4/M
Output:$24/M
GPT-5.5 excels in code writing, online research, data analysis, and cross-tool operations. The model not only improves its autonomy in handling complex multi-step tasks but also significantly improves reasoning capabilities and execution efficiency while maintaining the same latency as its predecessor, marking an important step towards automated office automation in AI.

Related Blog

How to Run DeepSeek V4 Locally
Apr 30, 2026
deepseek-v4

How to Run DeepSeek V4 Locally

The practical way to run DeepSeek V4 locally is to use the official open-source weights with a high-performance serving stack such as vLLM, then expose the model through a local OpenAI-compatible endpoint. DeepSeek’s current public materials describe two models in the V4 family: DeepSeek-V4-Pro at 1.6T total parameters / 49B active, and DeepSeek-V4-Flash at 284B total parameters / 13B active, both with 1M-token context and three reasoning modes. vLLM’s current local deployment examples target 8× B200/B300 for Pro and 4× B200/B300 for Flash. If you do not have that kind of hardware, a hosted fallback like CometAPI is the more practical path.
How to Use Deepseek V4 API
Apr 24, 2026
deepseek-v4

How to Use Deepseek V4 API

For developers, that combination matters for one simple reason: it lowers migration friction while raising the ceiling on what you can build. You are not learning a brand-new API shape. You are updating the model name, keeping the base URL, and shipping against a larger context window with newer reasoning behavior. DeepSeek’s official docs explicitly say to keep the base URL and change the model parameter to deepseek-v4-pro or deepseek-v4-flash.
Deepseek v4 released: What is and How to Access
Apr 24, 2026
deepseek-v4

Deepseek v4 released: What is and How to Access

DeepSeek-V4 is DeepSeek’s new preview flagship model family, officially launched on April 24, 2026. It includes DeepSeek-V4-Pro and DeepSeek-V4-Flash, both of which support 1 million tokens of context, expose OpenAI-compatible and Anthropic-compatible APIs, and are available on DeepSeek’s app, mobile app, and CometAPI's API. In practical terms, Pro is the higher-capability choice for difficult reasoning and agentic coding, while Flash is the faster, more economical option for high-throughput workloads.
DeepSeek v4 is now available on the web: How to access and test it
Apr 9, 2026
deepseek-v4

DeepSeek v4 is now available on the web: How to access and test it

DeepSeek V4 gray-scale test has leaked and is actively rolling out in limited form on the official web platform. Select users now see a redesigned interface with **Fast Mode** (default, high-speed daily use), **Expert Mode** (deep reasoning and complex problem-solving), and **Vision Mode** (multimodal image and video handling). This marks the most significant update since DeepSeek-V3.2, with rumored 1 million token context windows, updated knowledge bases, native multimodal capabilities, and a new underlying architecture optimized for speed, logic, and efficiency.