SCBE Benchmark Evidence Dashboard

Generated 2026-05-31T04:03:11.771Z from unknown @ e8fb9af2a

Evidence lanes

6/10

Readiness

60%

Proof rule

Every public claim must cite: command, artifact path, commit hash, and claim boundary.

Lane	Status	Command / Artifact	Hash	Summary / Boundary
hard-agentic hard agentic pretest matrix (12/14 readiness lanes)	evidence-ready	`scbe bench hard-agentic --json` artifacts/benchmarks/hard_agentic_pretest/latest_report.json	7e0de2ea	{"blocked_or_failed":5,"executed":8,"ready_or_pass":9,"target_count":14} local readiness/pretest matrix; not a public benchmark leaderboard score
research BrowseComp/GAIA-style local research fixtures	evidence-ready	`scbe bench research --json` artifacts/benchmarks/research_agent_fixtures/latest_report.json	5564f227	{"baseline_pass_rate":0,"baseline_passes":0,"decision":"PASS","scbe_pass_rate":1,"scbe_passes":2,"task_count":2,"unresolved_tasks":[]} local BrowseComp/GAIA-style fixtures; not public BrowseComp or GAIA scores
rubix-browser permission-hypercube browser-control geometry fixture	evidence-ready	`scbe bench rubix-browser --json` artifacts/benchmarks/rubix_browser_hypercube/latest_report.json	b2f4096e	{"baseline_avg":0.4167,"baseline_completed":0,"baseline_illegal_moves":3,"decision":"PASS","hypercube_avg":1,"hypercube_completed":3,"hypercube_illegal_moves":0,"task_count":3} local browser-control geometry fixture; not WebArena, BrowserGym, OSWorld, or VisualWebArena score
arc-agi2 ARC-AGI-2 local baseline (rule-free strategies, lower bound)	missing-artifact	`scbe bench arc-agi2 --json` artifacts/benchmarks/arc_agi2_local/latest_report.json	missing	No latest artifact yet rule-free lower-bound baselines on public ARC-AGI-2 data; not a competitive ARC-AGI-2 submission score
arc-style-grid ARC-style grid reasoning fixture (SCBE sensor outputs)	missing-artifact	`scbe bench arc-style-grid --json` artifacts/benchmarks/arc_style_grid/latest_report.json	missing	No latest artifact yet local ARC-style grid fixture using SCBE sensor outputs; not a public ARC score
swe-local SWE-style local real-patch repair fixtures	missing-artifact	`scbe bench swe-local --json` artifacts/benchmarks/swe_local/latest_report.json	missing	No latest artifact yet local real-patch fixtures; not SWE-bench Verified or SWEbench.com leaderboard score
cli-competitive CLI command accuracy vs Codex/Claude-Code-style baselines	evidence-ready	`scbe bench cli-competitive --json` artifacts/benchmarks/cli_competitive/cli_competitive_benchmark_latest.json	8710eb77	{} local CLI command accuracy fixture; not a published competitive benchmark score
compound-decompose RDKit compound decomposition/recomposition through atom mud	evidence-ready	`scbe bench compound-decompose --json` artifacts/benchmarks/compound_decomposition_recomposition/latest_report.json	c1c9f514	{"decision":"PASS","rdkit_available":true,"case_count":30,"passed":30,"pass_rate":1,"mud_step":5,"rdkit_error":null} computational compound decomposition/recomposition benchmark; not wet-lab synthesis, biological efficacy proof, dosing guidance, or medical advice
providers AI provider health matrix (local > free > paid free-first policy)	missing-artifact	`scbe bench providers --json` artifacts/benchmarks/provider_health/latest_report.json	missing	No latest artifact yet local provider reachability check; not an API reliability guarantee
longform Longform Bridge durable CLI workflow with squad dispatch receipts	evidence-ready	`scbe bench longform --json` artifacts/benchmarks/longform_cli_benchmark_latest.json	40eea47c	{} local durable-workflow CLI fixture; not a guarantee of autonomous task completion