33 tasks sorted by pass rate (easiest first).

Task Target Pass Rate Cost Time Cheapest Fastest
sozu-backdoor-detect-negative 97%
$0.26 7m $0.02 DeepSeek <1m OpenAI
radare2-decompile 90%
$0.07 2m $0.00 DeepSeek <1m OpenAI
ghidra-decompile-vanilla GHIDRA_FAV150 87%
$0.27 7m $0.01 DeepSeek 1m OpenAI
lighttpd-backdoor-detect-negative2 87%
$0.22 5m $0.01 Grok 1m OpenAI
radare2-decompile-jq 87%
$0.12 3m $0.00 DeepSeek 1m OpenAI
sozu-backdoor-detect-negative2 86%
$0.24 5m $0.01 DeepSeek 1m OpenAI
lighttpd-backdoor-detect-negative 83%
$0.23 5m $0.01 Grok 1m OpenAI
ghidra-decompile-pyghidra GHIDRA_FAV150 82%
$0.19 6m $0.01 DeepSeek 1m OpenAI
dnsmasq-backdoor-detect-negative 79%
$0.49 10m $0.01 Grok 1m Grok
dnsmasq-backdoor-detect-negative2 79%
$0.44 11m $0.01 Grok 1m OpenAI
ghidra-decompile-pyghidra-jq GHIDRA_FAV150 71%
$0.26 10m $0.03 Kimi 4m OpenAI
ghidra-decompile-vanilla-jq GHIDRA_FAV150 67%
$0.46 15m $0.11 Z.ai 7m Google
dropbear-brokenauth-detect-negative2 65%
$0.62 13m $0.01 Grok 1m OpenAI
dropbear-brokenauth-detect-negative 59%
$0.52 9m $0.01 Grok 1m OpenAI
dnsmasq-backdoor-detect 55%
$0.27 5m $0.02 DeepSeek 1m Google
dnsmasq-backdoor-detect-posix-spawn 53%
$0.39 8m $0.04 Z.ai 1m Google
dnsmasq-backdoor-detect-syscall 49%
$0.38 8m $0.02 DeepSeek 1m Google
dnsmasq-backdoor-detect-obfuscated 46%
$0.28 8m $0.04 DeepSeek 2m Anthropic
lighttpd-timebomb-multiple-binaries-detect 41%
$0.47 7m $0.10 Google 3m OpenAI
dnsmasq-backdoor-detect-printf 36%
$0.33 9m $0.01 Grok 2m Google
dnsmasq-backdoor-detect-execvp-obfuscated 27%
$0.48 11m $0.05 Z.ai 2m Google
lighttpd-backdoor-multiple-binaries-detect 24%
$1.05 15m $0.03 DeepSeek 5m Google
sozu-backdoor-multiple-binaries-detect 21%
$1.12 23m $0.24 Google 4m Google
dropbear-brokenauth-detect-nologline 18%
$0.66 12m $0.09 Anthropic 1m Anthropic
sozu-backdoor-multiple-arch-binaries-detect 18%
$0.30 8m $0.02 Grok 4m Anthropic
dnsmasq-backdoor-detect-posix-spawn-obfuscated 14%
$1.24 24m $0.41 Google 3m Google
dropbear-brokenauth-detect 12%
$0.91 14m $0.14 Google 2m Google
dnsmasq-backdoor-detect-syscall-obfuscated 8%
$1.32 19m $0.17 Google 3m Google
lighttpd-backdoor-detect-proc-obfuscated 5%
$0.51 9m $0.17 Z.ai 4m Google
lighttpd-backdoor-multiple-arch-binaries-detect 4%
$1.05 14m $0.72 Google 11m Google
lighttpd-backdoor-detect-open 3%
$1.04 12m $0.49 Anthropic 10m Anthropic
dropbear-brokenauth2-detect 0%
sozu-timebomb-multiple-binaries-detect 0%

Cost and Time show median values computed only from successful runs. Cheapest and Fastest show the best single run for each task.

All product names, logos, and brands (™/®) are the property of their respective owners; they're used here solely for identification and comparison, and their use does not imply affiliation, endorsement, or sponsorship.