Overlap harness

Hybrid scorecard: what each detector contributes on the same benchmark

This page keeps the three subsystems on one board instead of letting them hide behind different datasets. The classifier, RuntimeGate, and trichromatic layer all see the same benchmark set here, so the gap between them is evidence, not storytelling.

March 30, 2026 Hybrid stack RuntimeGate Trichromatic Blind harness

The gate is still carrying most of the lift, but the hybrid matters because it preserves rare catches and keeps miss buckets explicit.

The point of the overlap harness is monotonic evidence. A later council outcome can raise severity, but it cannot silently downgrade a real quarantine that fired earlier in the chain.

34.5%
Classifier raw recall
49%
RuntimeGate recall
6%
Trichromatic recall
54.5%
Hybrid recall
0 / 10
Benign false positives

Same prompts, different detectors

LaneDetectedRateWhat it tells us
classifier_raw69 / 20034.5%Good on public-style attack patterns, weak on SCBE-specific structural families
classifier_overlay14 / 2007%Very conservative live thresholding when used as an overlay instead of a raw label surface
gate98 / 20049%Current backbone for structural and policy-shaped detection
trichromatic12 / 2006%Too weak as a broad detector, but still useful as a structural escalation layer
hybrid109 / 20054.5%Best combined result on this harness after the monotonic-veto fix

Overlap, not intuition

The overlap table is where the real story lives. Most caught attacks are coming from the gate, but there are still rare union-only catches worth preserving.

Component overlapCountMeaning
gate86The gate alone found the attack and still forms the core of the hybrid stack
classifier9Raw classifier-only catches still exist and justify keeping the overlay path
gate + trichromatic8Structural agreement between the gate and hidden-band diagnostics
trichromatic1Rare trichromatic-only catch preserved by the monotonic combiner
none91Misses still visible, still useful, still shaping the next dataset pull

Where the hybrid still fails

The harness is useful because it leaves the misses visible. The highest miss buckets are the categories that need either more data or better structural checks next.

CategoryMissesWhy it matters
prompt_extraction9Polite extraction requests still slip past too often
multilingual9The model and rule stack still need more multilingual examples
role_confusion8Persona and role hijack are still under-covered
tongue_manipulation7SCBE-specific language abuse still needs direct coverage
model_extraction7Weights and architecture theft patterns still under-trigger
autonomous_escalation7Scope creep chains still need stronger escalation handling

What changed in the combiner

The latest run reflects a monotonic severity fix. In practical terms, a real trichromatic quarantine can no longer be neutralized by a later all-pass council outcome. That change preserved one union-only attack and removed the old benign numeric false positive at the same time.

Evidence packet

artifacts/benchmark/hybrid_overlap_latest.json
timestamp: 2026-03-30T02:08:57Z
benchmark: 200 attacks, 10 benign prompts
monotonic combiner active