One eval pack. One reproducible path.

Use this page when you want to reproduce the current public benchmark story instead of relying on a table alone. It points to one corpus, one local runner, one comparison script, and one manifest.

Commands

pytest tests/adversarial/test_adversarial_benchmark.py -v
python scripts/benchmark/scbe_vs_industry.py
Get-Content artifacts\benchmark\industry_benchmark_report.json

Canonical files

tests/adversarial/attack_corpus.py - canonical red-team corpus source
tests/adversarial/test_adversarial_benchmark.py - public corpus runner
scripts/benchmark/scbe_vs_industry.py - comparison runner
docs/eval/manifest.json - hashes, commands, and claim boundaries
docs/research/BENCHMARK_VERIFICATION_2026-03-23.md - reproducibility note

What this page does not claim

This eval pack supports reproducibility of the local public benchmark lane. It does not claim that every external environment, model provider, or deployment surface will reproduce identical numbers without reading the command path, dependencies, and artifact outputs.

Repo docs