Security evaluation

17-Point Military-Grade AI Security Evaluation Scale

Existing AI benchmarks measure accuracy. They do not measure whether an AI system can survive adversarial attack, resist exploitation, or enforce execution control under hostile conditions. This evaluation framework fills that gap with 17 discrete security levels and 20 attack categories mapped to real-world threat taxonomies.

Security
Benchmark
Defense
MITRE
OWASP

Why Existing Benchmarks Fail

The 17-Level Scale

Level Name Description
1NoneNo security measures. Raw model output with no filtering.
2Keyword FilterBlocklist-only defense. Trivially bypassed.
3Basic ClassifierSingle-pass binary classifier (safe/unsafe).
4Pattern MatchRegex and template-based detection of known attack patterns.
5ContextualMulti-turn context tracking. Detects escalation chains.
6SemanticEmbedding-based intent analysis. Catches paraphrased attacks.
7StructuralDetects encoding attacks (Unicode, Base64, token splitting).
8Multi-DimensionalCombined semantic + structural + contextual analysis across multiple signal dimensions.
9Adversarial-AwareTrained against adversarial datasets. Actively probes for evasion.
10Hyperbolic-BoundedUses hyperbolic geometry to make adversarial drift exponentially costly.
11Autonomous ResponseSelf-healing defenses. Quarantines and remediates without human intervention.
12Federated DefenseMulti-agent distributed threat sharing across fleet.
13PredictiveAnticipates novel attack classes before they appear in the wild.
14Formal VerifiedMathematical proofs of security properties. Zero-knowledge audit trails.
15Post-Quantum SecuredAll cryptographic operations use PQC algorithms (ML-KEM-768, ML-DSA-65).
16Full SpectrumDefense across all modalities: text, audio, image, video, code, network.
17Quantum SovereignQuantum-native security. Lattice-based governance with no classical fallback dependency.

Current SCBE Position

L8
Confirmed: Multi-Dimensional
L10
Borderline: Hyperbolic-Bounded
20
Attack categories evaluated

SCBE-AETHERMOORE currently operates at a confirmed Level 8 (Multi-Dimensional) with borderline Level 10 (Hyperbolic-Bounded) capabilities. The 14-layer pipeline provides structural detection (L7), the harmonic wall provides hyperbolic cost scaling (L10), and PQC primitives address L15 requirements. Gaps remain in autonomous response (L11) and federated fleet defense (L12).

20 Attack Categories

# Category MITRE ATLAS OWASP LLM SCBE Detection
1Direct prompt injectionAML.T0051LLM0197.2%
2Indirect prompt injectionAML.T0051.001LLM0189.4%
3Jailbreak (role-play)AML.T0054LLM0194.1%
4Jailbreak (hypothetical)AML.T0054LLM0191.7%
5Unicode obfuscationAML.T0043--98.6%
6Base64 / encoding attackAML.T0043--99.1%
7Token splitting / fragmentationAML.T0043--96.3%
8Multi-turn escalationAML.T0040LLM0187.9%
9Context window poisoningAML.T0049LLM0392.5%
10Data exfiltration via outputAML.T0048LLM0695.8%
11Insecure output handling--LLM0288.3%
12Model denial of serviceAML.T0029LLM0493.6%
13Supply chain (plugin/tool)AML.T0010LLM0579.4%
14Excessive agency--LLM0891.2%
15Overreliance exploitation--LLM0984.7%
16Training data extractionAML.T0024LLM0696.9%
17Adversarial suffix attackAML.T0043--93.8%
18Cross-context leakingAML.T0048LLM0690.1%
19Payload smuggling (nested)AML.T0043LLM0197.4%
20Semantic steganographyAML.T0043--85.2%

Benchmark Results Summary

92.6%
Mean detection rate (20 categories)
98.6%
Best: encoding attacks
79.4%
Weakest: supply chain

SCBE vs. DeBERTa: Comparative Analysis

Where SCBE wins

Where DeBERTa wins

The verdict

DeBERTa is a strong classifier for known threats. SCBE is a defense architecture for unknown threats. The optimal deployment uses DeBERTa as a fast first-pass filter (L3-L4 on the 17-point scale) with SCBE as the structural enforcement layer (L7-L10).

Standards Coverage

Standard Coverage
MITRE ATLAS16 of 20 categories mapped to ATLAS technique IDs
OWASP LLM Top 10 (2025)9 of 10 risks covered (LLM07 Insecure Plugin partially addressed)
NIST AI RMF 1.0GOVERN, MAP, MEASURE, MANAGE functions mapped to evaluation tiers
DoD Directive 3000.09Autonomous system safety levels aligned to tiers 11-17
Executive Order 14110Red-team testing requirements satisfied at tier 9+