14 models tested across 23 tasks, ranked by pass rate.
| # | Organization | Model | Pass Rate | Cost | Time | License | Released | |
|---|---|---|---|---|---|---|---|---|
| 1 | anthropic | claude-opus-4.5 | 29% | | $84 | 12m | Proprietary | 2025-11-24 |
| 2 | openai | gpt-5.2 | 26% | | $48 | 18m | Proprietary | 2025-12-11 |
| 3 | anthropic | claude-sonnet-4.5 | 22% | | $69 | 11m | Proprietary | 2025-09-29 |
| 4 | gemini-3-flash-preview | 19% | | $7 | 6m | Proprietary | 2025-12-17 | |
| 5 | gemini-3-pro-preview | 16% | | $33 | 8m | Proprietary | 2025-11-18 | |
| 6 | openai | gpt-5.2-codex | 16% | | $40 | 22m | Proprietary | 2025-12-18 |
| 7 | openai | gpt-5.1 | 14% | | $30 | 15m | Proprietary | 2025-11-12 |
| 8 | z-ai | glm-4.7 | 13% | | $23 | 18m | Apache 2.0 | 2025-12-22 |
| 9 | deepseek | deepseek-v3.2 | 12% | | $12 | 22m | MIT | 2025-12-01 |
| 10 | openai | gpt-5.1-codex-max | 12% | | $57 | 18m | Proprietary | 2025-11-19 |
| 11 | moonshotai | kimi-k2-thinking | 7% | | $9 | 21m | MIT | 2025-11-06 |
| 12 | anthropic | claude-haiku-4.5 | 6% | | $29 | 9m | Proprietary | 2025-10-15 |
| 13 | x-ai | grok-4 | 4% | | $56 | 16m | Proprietary | 2025-07-09 |
| 14 | x-ai | grok-4.1-fast | 3% | | $10 | 18m | Proprietary | 2025-11-19 |
All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.