Strict isolation training run: Kaggle + local SFT + blind holdout

The March 30 snapshot ties together the public Kaggle adversarial dataset, local SCBE-specific SFT records, benign document slices, and a separate 20-category blind holdout. The point of this page is not the highest number. The point is that the benchmark stayed separate from training.

March 30, 2026 Kaggle Hugging Face Strict isolation Blind holdout

Evidence rule: benchmark attacks never enter training data.

The classifier can look strong on in-distribution attack patterns and still fail on structural attack families. This run keeps that distinction visible instead of hiding it behind one blended score.

Inspect the overlap scorecard Return to research hub

40K

Kaggle adversarial samples

4,846

Local SCBE SFT records

500

Benign document samples

400

Blind holdout prompts

94.9%

Train-eval accuracy

73.5%

Blind benchmark detection

Pipeline in one command

The repo records this loop as a repeatable training path, not a one-off notebook experiment:

python scripts/unified_training_pipeline.py --sklearn --push

Stage	Source	Role in the run
1	Kaggle MPDD	Standard adversarial prompt corpus
2	Local SCBE SFT	Repo-specific instruction and governance records
3	Benign docs	Safe language baseline
4	20-category holdout	Blind benchmark that stays outside training
5	Hugging Face publish lane	Model artifact and report persistence

Why this page exists

Earlier SCBE notes documented a much harder overlap-harness result: raw classifier recall at 34.5 percent on the 20-category benchmark. That older result is still useful, because it proves the public Kaggle distribution does not cover every SCBE attack family. This newer strict-isolation run shows the training lane has moved forward, but it does not erase the need for a separate overlap harness and structural gate.

Current weak spots in the blind holdout

The latest training report still leaves obvious attack families under-covered. The weakest measured categories in the current run are the ones that should shape the next data pull and the next architecture pass.

Category	Detection	What it suggests
context_overflow	20%	Need more window-manipulation and long-context training examples
multi_step_chain	40%	Multi-turn escalation still under-modeled
encoding_obfuscation	50%	Wrapper and encoding patterns need more direct coverage
model_extraction	50%	Weights, architecture, and prompt-leak behavior still need data

Evidence packet

This page is derived from a timestamped repo artifact, not a hand-edited marketing summary.

artifacts/training/training_report.json
pipeline: unified_scbe_training_v2_strict_isolation
timestamp: 2026-03-30T05:22:53Z