#1
DeepSeek: DeepSeek V4 Flash
DeepSeekDeepSeek V4 Flash takes the top slot on cost-per-quality by a wide margin. The math: $0.14 input and $0.28 output average to $0.21 per million tokens, divided by AA Intelligence Index 46.5 gives roughly $0.0045 per Intelligence point — about 5x lower than the runner-up. The score itself is the part that surprises people. AA Intelligence 46.5 lands in the same band as mid-tier reasoning models, GPQA Diamond reaches 0.894, and Tau-2 tool use scores 0.95, close to the leaders. Output speed is 67 tokens per second — slower than Gemini Flash Lite but enough for streaming chat once the first token lands at 0.82 seconds. Context is 1M tokens with a 384K max output, which fits long answers and chain-of-thought traces without truncation. Cache reads at $0.028 keep multi-turn agent prompts in the floor band, which compounds: a chat product with a stable 4K system prompt and ten turns per session frequently sees 60-80% of input billing flow through cache reads, and at that volume the V4 Flash cache rate translates directly into the lowest production bill in the ranking. The model is text-only, so workloads that need native image or audio input still need a multimodal companion — Gemini 3.1 Flash Lite Preview is the obvious pair when you need vision below the same cost band. Reach for V4 Flash as the default budget model for retrieval, summarization, batch enrichment, and prototype workloads.
Strengths
- $0.0045 per Intelligence point — roughly 5x lower than the runner-up
- $0.14 / $0.28 per 1M with cache-read at $0.028
- AA Intelligence Index 46.5 — usable on real production tasks
- Tau-2 0.95 makes it viable inside agent loops
- 1M context + 384K max output
Weaknesses
- 67 tok/s output — slower than Gemini Flash Lite for streaming UX
- Text-only — no native image or audio input
Verify on Artificial Analysis→#2
Gemini 3.1 Flash Lite Preview
GoogleGemini 3.1 Flash Lite Preview comes in second on dollars-per-Intelligence-point at $0.0261, roughly 5.8x higher than DeepSeek V4 Flash but still the second-best ratio on the ranking. The blended rate is $0.875 per million ($0.25 input, $1.50 output), divided by AA Intelligence Index 33.5. The Intelligence score is the lowest in the ranking and only modestly above the floor of 30, so the model is best-suited to orchestration, summarization, classification, and structured extraction rather than graduate-level reasoning — Tau-2 tool use at 0.313 confirms that complex multi-step tool chains will fail more often than they succeed. What it offers in exchange shapes the use case: AA measures 321 output tokens per second — three times faster than V4 Flash and the fastest budget-tier throughput we cite, although time-to-first-token is a sluggish 5.1 seconds, so the model wins on streaming after the first chunk arrives, not on initial latency. Native multimodal input (text, image, audio, video, file) on a single endpoint is rare at this price band; cache reads cost $0.025, the lowest cache rate in the ranking. Reach for Flash Lite when the workload is multimodal, fan-out heavy (many short calls per query), or specifically bottlenecked on tokens-per-second after the first chunk lands.
Strengths
- $0.0261 per Intelligence point — second-best ratio on the list
- 321 tok/s output — fastest in the budget tier by 3x
- Multimodal in: text + image + audio + video + file
- Cache read $0.025 — lowest cache rate in the ranking
- 1M context window
Weaknesses
- AA Intelligence Index 33.5 — only marginally above the quality floor
- Tau-2 0.313 means complex tool-use chains fail
- Preview status — vendor reserves the right to bump pricing
Verify on Artificial Analysis→Qwen3.6 Plus lands at $0.0350 per Intelligence point — roughly 7.8x higher than DeepSeek V4 Flash but earned by carrying meaningfully more reasoning capacity. The math is $0.5 input + $3 output → blended $1.75 ÷ AA Intelligence 50. The Intelligence score is the second highest in the ranking, GPQA Diamond reaches 0.882, and Tau-2 tool use scores 0.977 — the highest tool-use number in the ranking and a key reason to choose Qwen3.6 Plus for agent loops where reliable function calling matters more than raw token throughput. Output speed is 52 tokens per second with a 1.5-second time-to-first-token, on par with V4 Flash for streaming chat. The model accepts text, image, and video input through the same endpoint, which extends the budget multimodal coverage from Gemini Flash Lite (image+audio+video+file) into a higher-quality option for image-grounded reasoning. Context is 1M tokens with a 65K max output cap — long enough for most chat and extraction tasks but shorter than V4 Flash on chain-of-thought traces or extended report generation. Cache pricing covers writes ($0.625 per million) but not reads, so the cache cost advantage of V4 Flash and Gemini Flash Lite does not carry over to Qwen3.6 Plus on stable-prefix traffic; the model competes on quality, multimodal breadth, and tool-use reliability rather than total cost on multi-turn chat workloads.
Strengths
- AA Intelligence Index 50 — second highest in the ranking
- Tau-2 0.977 — highest tool-use reliability in the ranking
- Multimodal in: text + image + video
- 1M context window
Weaknesses
- No published cache-read price — multi-turn savings limited
- 65K max output — shorter than V4 Flash on long traces
- $0.0350 per Intelligence point — 7x higher than V4 Flash
Verify on Artificial Analysis→Xiaomi's MiMo-V2.5-Pro takes fourth at $0.0372 per Intelligence point — blended $2 ($1 input, $3 output) ÷ AA Intelligence Index 53.8. The Intelligence score is the highest in the ranking, GPQA Diamond is 0.866, HLE reaches 0.338, and Tau-2 tool use scores 0.942 — close to Qwen3.6 Plus on tool-use reliability and slightly above it on raw Intelligence Index. Output is 58 tokens per second with a 2-second time-to-first-token; the time-to-first-token is the slowest first-byte profile in the ranking and notable for any chat-style UX where the user is waiting on the first chunk. Context is 1M tokens with a 131K max output, comfortable for long traces, summary writing, and report generation that exceeds Qwen3.6 Plus's 65K cap. Cache reads are published at $0.20 per million — higher than V4 Flash and Gemini Flash Lite, meaning multi-turn pipelines that rely on stable-prefix cache traffic do not get the same cache-driven cost reduction here. The model is text-only, so workloads with image or audio input need to combine it with a multimodal partner. Choose MiMo-V2.5-Pro when reasoning quality matters most among the budget candidates, the latency profile is acceptable, and the cache rate sits inside the budget. The model essentially trades latency and cache rate for the highest Intelligence Index in the budget band.
Strengths
- AA Intelligence Index 53.8 — highest in the ranking
- GPQA Diamond 0.866 with HLE 0.338
- 131K max output — long-trace comfortable
- 1M context window
Weaknesses
- 2-second TTFT — slowest first-byte in the ranking
- Cache read $0.20 — higher than V4 Flash or Gemini Flash Lite
- Text-only — no native image or audio input
Verify on Artificial Analysis→#5
DeepSeek: DeepSeek V4 Pro
DeepSeekDeepSeek V4 Pro closes the ranking at $0.0507 per Intelligence point — blended $2.61 ($1.74 input, $3.48 output) ÷ AA Intelligence Index 51.5. The Pro variant trades V4 Flash's throughput for a Tau-2 of 0.962 and a slightly higher Intelligence Index, with HLE at 0.359 — the highest difficulty number in the ranking, indicating the model holds up better on the kinds of problems where the other budget options crack. Output speed is 30 tokens per second, the slowest in the ranking and at the boundary where AA's own snapshot calls 'batch territory'; first-byte latency is 1 second, fine for non-interactive workloads but uncomfortable for real-time chat. Context is 1M tokens with 384K max output, same as V4 Flash, which means long chain-of-thought traces and extended report generation both fit without truncation. Cache reads at $0.145 per million sit between V4 Flash ($0.028) and MiMo ($0.20), giving multi-turn pipelines a partial cache benefit but not the floor rate V4 Flash carries. Reach for V4 Pro when the workload genuinely needs the extra Tau-2 reliability or the HLE-grade reasoning depth, and the higher cost-per-point fits the budget envelope. For most pipelines V4 Flash gives more quality per dollar; V4 Pro is the upgrade slot for the subset of calls where reasoning depth matters more than throughput.
Strengths
- HLE 0.359 — highest difficulty score in the ranking
- Tau-2 0.962 with AA Intelligence 51.5
- 1M context + 384K max output — same as V4 Flash
- Cache read $0.145 — published and usable
Weaknesses
- $0.0507 per Intelligence point — highest cost-per-point in the ranking
- 30 tok/s output — batch territory per AA's own snapshot
- Text-only — no native image or audio input
Verify on Artificial Analysis→