Training flywheel

Strict isolation training run: Kaggle + local SFT + blind holdout

The March 30 snapshot ties together the public Kaggle adversarial dataset, local SCBE-specific SFT records, benign document slices, and a separate 20-category blind holdout. The point of this page is not the highest number. The point is that the benchmark stayed separate from training.

March 30, 2026 Kaggle Hugging Face Strict isolation Blind holdout

Evidence rule: benchmark attacks never enter training data.

The classifier can look strong on in-distribution attack patterns and still fail on structural attack families. This run keeps that distinction visible instead of hiding it behind one blended score.

40K
Kaggle adversarial samples
4,846
Local SCBE SFT records
500
Benign document samples
400
Blind holdout prompts
94.9%
Train-eval accuracy
73.5%
Blind benchmark detection

Pipeline in one command

The repo records this loop as a repeatable training path, not a one-off notebook experiment:

python scripts/unified_training_pipeline.py --sklearn --push
StageSourceRole in the run
1Kaggle MPDDStandard adversarial prompt corpus
2Local SCBE SFTRepo-specific instruction and governance records
3Benign docsSafe language baseline
420-category holdoutBlind benchmark that stays outside training
5Hugging Face publish laneModel artifact and report persistence

Why this page exists

Earlier SCBE notes documented a much harder overlap-harness result: raw classifier recall at 34.5 percent on the 20-category benchmark. That older result is still useful, because it proves the public Kaggle distribution does not cover every SCBE attack family. This newer strict-isolation run shows the training lane has moved forward, but it does not erase the need for a separate overlap harness and structural gate.

Current weak spots in the blind holdout

The latest training report still leaves obvious attack families under-covered. The weakest measured categories in the current run are the ones that should shape the next data pull and the next architecture pass.

CategoryDetectionWhat it suggests
context_overflow20%Need more window-manipulation and long-context training examples
multi_step_chain40%Multi-turn escalation still under-modeled
encoding_obfuscation50%Wrapper and encoding patterns need more direct coverage
model_extraction50%Weights, architecture, and prompt-leak behavior still need data

Evidence packet

This page is derived from a timestamped repo artifact, not a hand-edited marketing summary.

artifacts/training/training_report.json
pipeline: unified_scbe_training_v2_strict_isolation
timestamp: 2026-03-30T05:22:53Z