A single large model is a convenient abstraction, but it is a bad security design. Everything the agent knows, everything it can do, and every failure mode it exhibits is concentrated in one monolith. HYDRA takes the opposite approach: six small specialist agents, each fine-tuned on one of the Sacred Tongues, coordinating through the 14-layer SCBE pipeline. Each head is narrow, cheap to run, and easy to audit. The swarm is the capability.
| Head | Tongue | Role | Base Model |
|---|---|---|---|
| Scout | KO | Recon, site map, task planning | Qwen 2.5 7B |
| Vision | AV | Screenshot and DOM interpretation | Qwen-VL 7B |
| Reader | RU | Policy, TOS, constraint parsing | Llama 3.1 8B |
| Clicker | CA | Action selection and execution | Qwen 2.5 7B |
| Typer | UM | Secure input, secrets handling | Llama 3.1 8B |
| Judge | DR | Type / schema verification, final go-no-go | Mistral 7B |
Every head is a 4-bit QLoRA fine-tune of a small open-source base model. None of them are larger than 8B parameters, and any of them can run on a consumer GPU — or on HuggingFace Inference Providers when local hardware is not available. Training data comes from the auto-conversation pairs generated by Polly's self-play loop, filtered through the DNA bi-strand auditor.
HYDRA does not route by LLM function calls. It routes by tongue. Every message in the swarm is tagged with a Sacred Tongue, and the 14-layer pipeline determines which head should handle the next step based on which realm the message belongs to. A web-automation task, for example, flows like this:
(action, realm) pairs.At every handoff, the message passes through the 14-layer pipeline and is priced by H(d, R) = R^(d²). A head can only act if its action stays inside the current Harmonic Trust Tube. The governance check is not optional and not tunable — it is the plumbing.
Fine-tuning a 70B model for each head would be both expensive and wasteful. Each head only needs to be good at one narrow domain, and for narrow domains 4-bit QLoRA on a 7-8B base model is shockingly effective. A full HYDRA retrain takes about 18 hours on a single A100 rental, roughly 20 dollars of compute. That means the swarm can be retrained whenever the Polly corpus grows meaningfully, which in practice is every week or two.
Small heads also make the swarm easier to reason about. When Judge (DR) rejects something, you can inspect the specific 1B-parameter delta that was added to the base Mistral model and understand what rule it learned. With a monolithic 70B model this kind of audit is essentially impossible.
Point HYDRA at a target application with a permissive ruleset and it will systematically explore the attack surface, logging every governance decision. Because every action is priced, the final report includes not just a list of findings but a ranked cost map: the cheapest paths to sensitive state, the most expensive defenses to break, and the actions that were blocked pre-execution.
The original motivation. HYDRA can perform multi-step web tasks — filling forms, navigating flows, handling logins — with a governance check at every step. If you have ever wanted a browser agent that cannot be prompt-injected into exfiltrating cookies, HYDRA is that agent, because Reader (RU) will refuse to approve actions that violate the site's declared policy, and Typer (UM) physically cannot type credentials into an unapproved form.
Generalizes beyond the browser. Anywhere a long-running agent has to cross trust boundaries — CI pipelines, shell access, API chains — HYDRA slots in as the governed execution layer.
The $29 HYDRA Agent Templates package includes the QLoRA configs, training scripts, head prompts, and a reference orchestrator that wires the six heads through the SCBE pipeline. It is the fastest way to stand up a governed swarm on your own hardware. Commercial deployment and custom head training are available through the services page.