14 models tested across 23 tasks, ranked by pass rate.

# Organization Model Pass Rate Cost Time License Released
1 Anthropic anthropic claude-opus-4.5 29%
$84 12m Proprietary 2025-11-24
2 OpenAI openai gpt-5.2 26%
$48 18m Proprietary 2025-12-11
3 Anthropic anthropic claude-sonnet-4.5 22%
$69 11m Proprietary 2025-09-29
4 Google google gemini-3-flash-preview 19%
$7 6m Proprietary 2025-12-17
5 Google google gemini-3-pro-preview 16%
$33 8m Proprietary 2025-11-18
6 OpenAI openai gpt-5.2-codex 16%
$40 22m Proprietary 2025-12-18
7 OpenAI openai gpt-5.1 14%
$30 15m Proprietary 2025-11-12
8 Z.ai z-ai glm-4.7 13%
$23 18m Apache 2.0 2025-12-22
9 DeepSeek deepseek deepseek-v3.2 12%
$12 22m MIT 2025-12-01
10 OpenAI openai gpt-5.1-codex-max 12%
$57 18m Proprietary 2025-11-19
11 Kimi moonshotai kimi-k2-thinking 7%
$9 21m MIT 2025-11-06
12 Anthropic anthropic claude-haiku-4.5 6%
$29 9m Proprietary 2025-10-15
13 Grok x-ai grok-4 4%
$56 16m Proprietary 2025-07-09
14 Grok x-ai grok-4.1-fast 3%
$10 18m Proprietary 2025-11-19

All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.