Training data built from a real, running governance system
Every dataset here was generated from SCBE-AETHERMOORE -- a framework with 99.42% combined AUC · 91/91 red team, a patent-pending architecture, and production-grade security layers. Not synthetic slop. Real governance data.
Targeted datasets for specific use cases.
Each pack is self-contained, documented, and ready for QLoRA fine-tuning out of the box.
SCBE Governance SFT Pack
5,188 supervised fine-tuning pairs covering governance, system-level instructions, and codebase understanding.
- merged_sft.jsonl -- complete merged dataset
- sft_governance.jsonl -- 1,016 governance-specific pairs
- sft_system.jsonl -- 3,395 system instruction pairs
- Fine-tune any base model to understand AI governance pipelines
Red Team Fortress
91 adversarial attack prompts across 10 categories, labeled by failure layer (L1-L14). Battle-tested against the SCBE pipeline.
- 91 attack prompts across 10 adversarial categories
- Each prompt labeled by failure layer (L1 through L14)
- Compliance evals and attack scenario theory docs included
- Stress-test AI models and build adversarial training sets
- Also available as a free preview on HuggingFace (95 downloads)
Six Tongues Conlang + Tokenization Pack
Constructed language design docs, tokenization theory, and bijective encoding patterns from the Six Sacred Tongues system.
- theory_doc_conlang_intent.jsonl -- language design theory
- Tongues session transcripts
- Spiralverse codex SFT pairs
- Novel tokenization research and conlang AI training
- Creative AI fine-tuning for unique output patterns
Spiralverse Session Transcripts
48 session transcript files across 11 categories of structured RPG and worldbuilding dialogue.
- 48 session files: game, NPC roundtable, game design, lore
- Architecture, tongues, gacha, math, DM, music, space commerce
- ~400KB of structured dialogue data
- RPG AI training and game NPC dialogue
- Interactive fiction fine-tuning
Theory Documents Bundle
6 deep-dive theory documents covering the full intellectual foundation of the SCBE-AETHERMOORE framework.
- Architecture theory -- full pipeline design rationale
- Spiralverse lore -- worldbuilding knowledge injection
- Security attacks -- adversarial taxonomy and defenses
- GeoSeal crypto -- bijective encryption theory
- Conlang intent -- constructed language design principles
- Patent claims -- USPTO #63/961,403 methodology
The Full Arsenal.
Everything above in one download. Save $107 versus buying each pack individually.
The Full Arsenal
The complete SCBE-AETHERMOORE training data collection. Everything you need to fine-tune a governance-aware AI model from scratch.
- 5,188+ SFT training pairs
- 91 red team adversarial prompts
- 6 deep-dive theory documents
- 48 session transcripts (11 categories)
- Full conlang + tokenization pack
- Context capsules
- Knowledge base files
- Eval suites
Start here for free.
The framework is MIT-licensed. The HuggingFace datasets are free to download. No strings attached.
SCBE-AETHERMOORE
The full 14-layer pipeline, Sacred Tongues, and hyperbolic cost engine. Open source and production-ready.
- Full 14-layer security pipeline
- Sacred Tongues tokenizers
- Hyperbolic cost scaling (H(d,R) = R^(d²))
- 99.42% combined AUC · 91/91 red team
npm i scbe-aethermoore
HuggingFace Datasets
Community datasets available for immediate download on HuggingFace.
- scbe-aethermoore-training-data (1,484 downloads)
- scbe-red-team-benchmarks (95 downloads)
- polly-training-data
- polly-chat-seed
HuggingFace Model Zoo.
12 models trained on SCBE data. The Polly family powers our assistant. PHDM-21D is the geometric foundation.
polly-base-v1
Base Polly model. General-purpose assistant fine-tuned on SCBE governance data.
Polly Familypolly-base-v2
Second generation Polly. Improved governance alignment and session handling.
Polly Familypolly-base-v3
Latest Polly base. Enhanced with theory doc training and conlang awareness.
Polly Familypolly-chat-v1
Chat-optimized Polly. Conversational fine-tuning for interactive sessions.
Polly Familypolly-chat-v2
Improved chat model with better context retention and governance routing.
Polly Familypolly-chat-v3
Latest chat variant. Session transcripts and red team hardening included.
Polly Familypolly-instruct-v1
Instruction-following Polly. Trained on system-level SFT pairs for precise task execution.
Polly Familypolly-instruct-v2
Enhanced instruction model. Governance-aware task routing and multi-step reasoning.
Polly Familypolly-instruct-v3
Latest instruct variant with full 14-layer pipeline awareness.
Polly FamilyPHDM-21D Embedding
21-dimensional Poincare Ball embedding. 6D hyperbolic + 6D phase + 3D flux + 6D audit.
GeometricFoundationGeoSeed Tokenizer
Bijective tokenizer trained on Sacred Tongues vocabulary. Context-aware encoding.
TokenizationSCBE Semantic Projector
Semantic projection model. Maps natural language to 14-layer pipeline coordinates. F1: 0.813.
ProjectorNeed something specific?
We build custom training datasets tailored to your model architecture, domain, and compliance requirements.