Models and Pricing
Key Models Offered
DeepSeek’s technology stack is designed to solve today’s most pressing challenges. Below are its flagship solutions:
DeepSeek-Chat (DeepSeek-V3)
1. Context Window: 64K tokens
2. Max Output: 8K tokens (default 4K if unspecified)
DeepSeek-Reasoner (DeepSeek-R1)
1. Context Window: 64K tokens
2. Reasoning Capability: Supports 32K “Chain of Thought” (CoT) tokens
3. Max Output: 8K tokens (includes both CoT reasoning + final answer)
Pricing Structure
Tokens represent text units (words, numbers, punctuation). Billing applies to combined input + output tokens.
| Model | Input (Cache Hit) | Input (Cache Miss) | Output Tokens |
|---|---|---|---|
| DeepSeek-Chat | $0.07/M | $0.14/M | $1.10/M |
| DeepSeek-Reasoner | $0.14/M | $0.14/M | $2.19/M |
Off-Peak Discounts (16:30–00:30 UTC):
- DeepSeek-Reasoner output → $0.550/M during off-peak
- DeepSeek-Chat: 50% off all rates
- DeepSeek-Reasoner: 75% off input tokens, 50% off output tokens
Examples: - DeepSeek-Chat input (cache hit) → $0.035/M during off-peak
Key Details
- Context Caching: Reduces input costs by reusing cached context (cache hit vs. miss pricing).
- CoT Tokens: Exclusive to DeepSeek-Reasoner; charges apply to all reasoning steps + final answers.
- Billing:
- Costs = (Input Tokens × Input Price) + (Output Tokens × Output Price)
- Granted balances (e.g., free tiers) are used before topping-up credits.
- Output Limits: Adjust
max_tokensparameter for responses beyond default 4K length.
Pro Tips for Users
- ✅ Optimize Costs: Use off-peak hours (UTC evenings/nights) for 50-75% discounts.
- ✅ Cache Wisely: Leverage context caching for repetitive queries.
- ✅ Monitor Usage: API completion timestamp determines pricing tier.
- Note: Prices subject to change. Always verify latest rates on DeepSeek’s official pricing page before large-scale usage.