Strict isolation training run: Kaggle + local SFT + blind holdout
The March 30 snapshot ties together the public Kaggle adversarial dataset, local SCBE-specific SFT records, benign document slices, and a separate 20-category blind holdout. The point of this page is not the highest number. The point is that the benchmark stayed separate from training.
Evidence rule: benchmark attacks never enter training data.
The classifier can look strong on in-distribution attack patterns and still fail on structural attack families. This run keeps that distinction visible instead of hiding it behind one blended score.
Pipeline in one command
The repo records this loop as a repeatable training path, not a one-off notebook experiment:
python scripts/unified_training_pipeline.py --sklearn --push
| Stage | Source | Role in the run |
|---|---|---|
| 1 | Kaggle MPDD | Standard adversarial prompt corpus |
| 2 | Local SCBE SFT | Repo-specific instruction and governance records |
| 3 | Benign docs | Safe language baseline |
| 4 | 20-category holdout | Blind benchmark that stays outside training |
| 5 | Hugging Face publish lane | Model artifact and report persistence |
Why this page exists
Earlier SCBE notes documented a much harder overlap-harness result: raw classifier recall at 34.5 percent on the 20-category benchmark. That older result is still useful, because it proves the public Kaggle distribution does not cover every SCBE attack family. This newer strict-isolation run shows the training lane has moved forward, but it does not erase the need for a separate overlap harness and structural gate.
Current weak spots in the blind holdout
The latest training report still leaves obvious attack families under-covered. The weakest measured categories in the current run are the ones that should shape the next data pull and the next architecture pass.
| Category | Detection | What it suggests |
|---|---|---|
| context_overflow | 20% | Need more window-manipulation and long-context training examples |
| multi_step_chain | 40% | Multi-turn escalation still under-modeled |
| encoding_obfuscation | 50% | Wrapper and encoding patterns need more direct coverage |
| model_extraction | 50% | Weights, architecture, and prompt-leak behavior still need data |
Evidence packet
This page is derived from a timestamped repo artifact, not a hand-edited marketing summary.
artifacts/training/training_report.json pipeline: unified_scbe_training_v2_strict_isolation timestamp: 2026-03-30T05:22:53Z