Models and Pricing

Key Models Offered

DeepSeek’s technology stack is designed to solve today’s most pressing challenges. Below are its flagship solutions:

DeepSeek-Chat (DeepSeek-V3)

1. Context Window: 64K tokens
2. Max Output: 8K tokens (default 4K if unspecified)

DeepSeek-Reasoner (DeepSeek-R1)

1. Context Window: 64K tokens
2. Reasoning Capability: Supports 32K “Chain of Thought” (CoT) tokens
3. Max Output: 8K tokens (includes both CoT reasoning + final answer)

Pricing Structure

Tokens represent text units (words, numbers, punctuation). Billing applies to combined input + output tokens.

Model	Input (Cache Hit)	Input (Cache Miss)	Output Tokens
DeepSeek-Chat	$0.07/M	$0.14/M	$1.10/M
DeepSeek-Reasoner	$0.14/M	$0.14/M	$2.19/M

Off-Peak Discounts (16:30–00:30 UTC):

DeepSeek-Reasoner output → $0.550/M during off-peak
DeepSeek-Chat: 50% off all rates
DeepSeek-Reasoner: 75% off input tokens, 50% off output tokens
Examples:
DeepSeek-Chat input (cache hit) → $0.035/M during off-peak

Key Details

Context Caching: Reduces input costs by reusing cached context (cache hit vs. miss pricing).
CoT Tokens: Exclusive to DeepSeek-Reasoner; charges apply to all reasoning steps + final answers.
Billing:
- Costs = (Input Tokens × Input Price) + (Output Tokens × Output Price)
- Granted balances (e.g., free tiers) are used before topping-up credits.
Output Limits: Adjust max_tokens parameter for responses beyond default 4K length.

Pro Tips for Users

✅ Optimize Costs: Use off-peak hours (UTC evenings/nights) for 50-75% discounts.
✅ Cache Wisely: Leverage context caching for repetitive queries.
✅ Monitor Usage: API completion timestamp determines pricing tier.
Note: Prices subject to change. Always verify latest rates on DeepSeek’s official pricing page before large-scale usage.