Models and Pricing

Key Models Offered

DeepSeek’s technology stack is designed to solve today’s most pressing challenges. Below are its flagship solutions:

DeepSeek-Chat (DeepSeek-V3)

1. Context Window: 64K tokens
2. Max Output: 8K tokens (default 4K if unspecified)

DeepSeek-Reasoner (DeepSeek-R1)

1. Context Window: 64K tokens
2. Reasoning Capability: Supports 32K “Chain of Thought” (CoT) tokens
3. Max Output: 8K tokens (includes both CoT reasoning + final answer)

Pricing Structure

Tokens represent text units (words, numbers, punctuation). Billing applies to combined input + output tokens.

ModelInput (Cache Hit)Input (Cache Miss)Output Tokens
DeepSeek-Chat$0.07/M$0.14/M$1.10/M
DeepSeek-Reasoner$0.14/M$0.14/M$2.19/M


Off-Peak Discounts (16:30–00:30 UTC):

  • DeepSeek-Reasoner output → $0.550/M during off-peak
  • DeepSeek-Chat: 50% off all rates
  • DeepSeek-Reasoner: 75% off input tokens, 50% off output tokens
    Examples:
  • DeepSeek-Chat input (cache hit) → $0.035/M during off-peak

Key Details

  1. Context Caching: Reduces input costs by reusing cached context (cache hit vs. miss pricing).
  2. CoT Tokens: Exclusive to DeepSeek-Reasoner; charges apply to all reasoning steps + final answers.
  3. Billing:
    • Costs = (Input Tokens × Input Price) + (Output Tokens × Output Price)
    • Granted balances (e.g., free tiers) are used before topping-up credits.
  4. Output Limits: Adjust max_tokens parameter for responses beyond default 4K length.

Pro Tips for Users

  1. ✅ Optimize Costs: Use off-peak hours (UTC evenings/nights) for 50-75% discounts.
  2. ✅ Cache Wisely: Leverage context caching for repetitive queries.
  3. ✅ Monitor Usage: API completion timestamp determines pricing tier.
  4. Note: Prices subject to change. Always verify latest rates on DeepSeek’s official pricing page before large-scale usage.