Byzantine Fault Tolerant AI Governance

Single-model safety classifiers are a single point of failure. One jailbreak, one adversarial prompt, one poisoned weight matrix -- and the entire safety layer collapses. SCBE replaces the single-classifier paradigm with a 6-council Byzantine Fault Tolerant consensus protocol where no single compromised council can override the collective decision.

Independent Councils

f=1

Fault Tolerance (1 compromised)

4/6

Quorum Threshold

0.83

Supermajority Ratio

The Problem: Single Points of Failure

Traditional AI safety systems rely on a single classifier -- a content filter, a reward model, or a constitutional rule set -- to decide whether an output is safe. This architecture has a fundamental flaw: if an adversary can compromise that single checkpoint, the entire safety guarantee disappears.

Prompt injection bypasses a single filter by reframing adversarial intent as benign context.
Fine-tuning attacks can shift a single reward model's decision boundary.
Gradient-based adversarial examples can fool a single classifier with imperceptible perturbations.

The solution is redundancy through diversity -- not multiple copies of the same check, but fundamentally different checks that examine orthogonal dimensions of safety.

The Solution: 6 Independent Councils

The six Sacred Tongues -- Kor'aelin (KO), Avali (AV), Runethic (RU), Cassisivadan (CA), Umbroth (UM), and Draumric (DR) -- are unified governance languages, each carrying a 256-token lexicon, phi-weighted cost scaling, trichromatic spectral bands, and cross-stitch bridge connections.

SCBE's governance layer maps directly onto the Sacred Tongues dimensional system. Each of the 6 tongues (KO, AV, RU, CA, UM, DR) governs an independent council that evaluates one specific dimension of the input. The councils operate in parallel, without shared state, and vote independently.

KO Council -- Intent Detection

Dimension: Override language and intent manipulation

The KO Council scans for explicit override language, role-play framing designed to bypass safety, and prompt injection markers. It operates on the semantic layer, looking for intent signals that attempt to redefine the system's operational boundaries. KO weight: 1.00 (baseline).

AV Council -- Transport Analysis

Dimension: External data flow patterns

The AV Council monitors data provenance and transport patterns. It flags requests that attempt to exfiltrate data, inject external content from untrusted sources, or establish covert channels. AV weight: 1.618 (phi).

RU Council -- Policy Enforcement

Dimension: Restricted operation checking

The RU Council maintains the policy ruleset -- which operations are permitted, which are restricted, and which require escalation. It checks every action against the current governance policy and flags violations. RU weight: 2.618 (phi squared).

CA Council -- Compute Anomaly Detection

Dimension: Computational resource anomalies

The CA Council monitors computational patterns for anomalies that indicate adversarial behavior: unusual memory access patterns, recursive depth explosions, timing side-channels, and resource exhaustion attempts. CA weight: 4.236 (phi cubed).

UM Council -- Redaction Guard

Dimension: Credential and PII access attempts

The UM Council is the redaction layer. It detects attempts to extract credentials, API keys, personally identifiable information, or other sensitive data. Any request that touches sensitive surfaces triggers UM review. UM weight: 6.854 (phi to the 4th).

DR Council -- Integrity Verification

Dimension: Encoding artifacts and injection markers

The DR Council operates at the structural level, looking for encoding anomalies (Unicode homoglyphs, zero-width characters, base64 payloads), injection markers, and structural artifacts that indicate content has been tampered with or crafted to evade other councils. DR weight: 11.09 (phi to the 5th).

BFT Properties

The 6-council system follows classical BFT consensus with n=6 agents and a tolerance of f=1 faulty (compromised) council. The quorum threshold is n - f = 5 councils must participate, and at least 4 must agree on ALLOW for the request to pass.

bft_consensus(votes[], quorum=4): n = 6 councils, f = 1 (max compromised) allow_count = count(votes where vote == ALLOW) if allow_count >= quorum: return ALLOW if allow_count >= quorum - 1: return QUARANTINE return DENY

This means an adversary must simultaneously compromise at least 3 of 6 councils -- each checking a fundamentally different dimension -- to force an ALLOW decision on a dangerous request. Because the councils examine orthogonal dimensions (intent vs. transport vs. policy vs. compute vs. redaction vs. integrity), compromising one council provides no leverage against the others.

Decision Matrix

Failing Councils	Allow Votes	Decision	Action
0	6/6	`ALLOW`	Request proceeds normally
1	5/6	`QUARANTINE`	Logged for review, may proceed with constraints
2	4/6	`DENY`	Blocked, audit trail generated
3+	3/6 or fewer	`DENY`	Blocked, escalated to governance review

Fibonacci Trust Integration

The BFT system does not operate in isolation. It integrates with SCBE's Fibonacci trust scoring, where consistent good behavior over time earns an entity higher trust scores. Trust scores modulate the quorum threshold:

Low trust (new or flagged entities): Quorum stays at 4/6 -- strict consensus required.
Medium trust (established entities): Quorum relaxes to 3/6 for routine operations, but stays at 4/6 for sensitive ones.
High trust (long history of compliance): Certain pre-approved operation classes may bypass full council voting, but any novel or sensitive request still requires full BFT consensus.

Trust scores follow Fibonacci growth: each consecutive period of good behavior increments the trust level by the next Fibonacci number, making early trust gains small and later gains significant. A single violation resets the trust counter, requiring the entity to rebuild trust from scratch.

trust_level(history) = fib(consecutive_clean_periods) quorum(trust) = max(3, 4 - floor(trust / fib_threshold)) // trust never drops quorum below 3 (absolute minimum)

Why 6 Councils, Not 3 or 12

The number 6 is not arbitrary. It maps directly to the Sacred Tongues dimensional system (KO/AV/RU/CA/UM/DR), where each tongue represents a fundamentally orthogonal axis of analysis. Fewer councils would reduce fault tolerance below practical thresholds. More councils would add latency without proportional security gains, because the 6 tongues already span the full governance space.

With f=1, the system tolerates one compromised council. To increase tolerance to f=2, you would need n >= 3f+1 = 7 councils -- which is achievable by adding a 7th meta-council that audits the other 6, a design reserved for enterprise deployments.

Consensus BFT Governance Security Multi-Agent