SCBE Benchmark Evidence Dashboard

Generated 2026-05-31T04:03:11.771Z from unknown @ e8fb9af2a
Evidence lanes
6/10
Readiness
60%
Proof rule
Every public claim must cite: command, artifact path, commit hash, and claim boundary.
LaneStatusCommand / ArtifactHashSummary / Boundary
hard-agentic
hard agentic pretest matrix (12/14 readiness lanes)
evidence-readyscbe bench hard-agentic --json
artifacts/benchmarks/hard_agentic_pretest/latest_report.json
7e0de2ea{"blocked_or_failed":5,"executed":8,"ready_or_pass":9,"target_count":14}
local readiness/pretest matrix; not a public benchmark leaderboard score
research
BrowseComp/GAIA-style local research fixtures
evidence-readyscbe bench research --json
artifacts/benchmarks/research_agent_fixtures/latest_report.json
5564f227{"baseline_pass_rate":0,"baseline_passes":0,"decision":"PASS","scbe_pass_rate":1,"scbe_passes":2,"task_count":2,"unresolved_tasks":[]}
local BrowseComp/GAIA-style fixtures; not public BrowseComp or GAIA scores
rubix-browser
permission-hypercube browser-control geometry fixture
evidence-readyscbe bench rubix-browser --json
artifacts/benchmarks/rubix_browser_hypercube/latest_report.json
b2f4096e{"baseline_avg":0.4167,"baseline_completed":0,"baseline_illegal_moves":3,"decision":"PASS","hypercube_avg":1,"hypercube_completed":3,"hypercube_illegal_moves":0,"task_count":3}
local browser-control geometry fixture; not WebArena, BrowserGym, OSWorld, or VisualWebArena score
arc-agi2
ARC-AGI-2 local baseline (rule-free strategies, lower bound)
missing-artifactscbe bench arc-agi2 --json
artifacts/benchmarks/arc_agi2_local/latest_report.json
missingNo latest artifact yet
rule-free lower-bound baselines on public ARC-AGI-2 data; not a competitive ARC-AGI-2 submission score
arc-style-grid
ARC-style grid reasoning fixture (SCBE sensor outputs)
missing-artifactscbe bench arc-style-grid --json
artifacts/benchmarks/arc_style_grid/latest_report.json
missingNo latest artifact yet
local ARC-style grid fixture using SCBE sensor outputs; not a public ARC score
swe-local
SWE-style local real-patch repair fixtures
missing-artifactscbe bench swe-local --json
artifacts/benchmarks/swe_local/latest_report.json
missingNo latest artifact yet
local real-patch fixtures; not SWE-bench Verified or SWEbench.com leaderboard score
cli-competitive
CLI command accuracy vs Codex/Claude-Code-style baselines
evidence-readyscbe bench cli-competitive --json
artifacts/benchmarks/cli_competitive/cli_competitive_benchmark_latest.json
8710eb77{}
local CLI command accuracy fixture; not a published competitive benchmark score
compound-decompose
RDKit compound decomposition/recomposition through atom mud
evidence-readyscbe bench compound-decompose --json
artifacts/benchmarks/compound_decomposition_recomposition/latest_report.json
c1c9f514{"decision":"PASS","rdkit_available":true,"case_count":30,"passed":30,"pass_rate":1,"mud_step":5,"rdkit_error":null}
computational compound decomposition/recomposition benchmark; not wet-lab synthesis, biological efficacy proof, dosing guidance, or medical advice
providers
AI provider health matrix (local > free > paid free-first policy)
missing-artifactscbe bench providers --json
artifacts/benchmarks/provider_health/latest_report.json
missingNo latest artifact yet
local provider reachability check; not an API reliability guarantee
longform
Longform Bridge durable CLI workflow with squad dispatch receipts
evidence-readyscbe bench longform --json
artifacts/benchmarks/longform_cli_benchmark_latest.json
40eea47c{}
local durable-workflow CLI fixture; not a guarantee of autonomous task completion