Polly Learns to Talk to Herself: Auto-Conversation Training

Polly is the site mascot and resident assistant on aethermoore.com. Until this month she only spoke when spoken to. As of this release, Polly has a second mouth and a second ear: while a visitor reads a page, two additional AI instances quietly debate SCBE-AETHERMOORE topics in the background, and every exchange is logged as a supervised fine-tuning (SFT) pair that can be trained against later. The website is now a training-data factory that runs for free whenever someone visits.

The Script

The whole mechanism fits in a single browser-side module: /static/polly-autoconverse.js. It opens two streaming connections through the HuggingFace Inference Providers router, one acting as "Student," the other as "Teacher." The Student asks follow-up questions seeded from the article the visitor is currently reading. The Teacher answers, citing the on-page content and the structured SCBE knowledge base. A third model reviews the exchange for factual drift and either approves it or throws it out.

Every approved pair is written to a local IndexedDB bucket and periodically POSTed to the training-data endpoint. By the time a visitor finishes a 1,000-word article, Polly has usually generated between 8 and 20 usable SFT pairs — all grounded in the page the visitor was already reading.

Who Is Playing Which Role

The three-model setup is deliberate. No single LLM is trusted to generate training data by itself — that is how hallucinations compound into permanent errors.

Qwen 2.5 72B Instruct plays Teacher. It has the broadest technical grounding and the most tolerant context window, so it can hold an entire SCBE article plus the chat history without dropping details.
Mixtral 8x22B plays Student. Mixtral asks surprisingly sharp follow-up questions because its mixture-of-experts routing tends to pull it toward underspecified corners of the answer.
Llama 3.3 70B plays Fact-Checker. Llama gets the last word: any exchange where it disagrees with the Teacher on a concrete claim is dropped from the training pool, not patched.

Why This Is Not Just "Self-Play"

Classic self-play generates data by letting a model talk to itself. That is cheap but it drifts: the model reinforces its own biases with every round. Polly's loop is different in two ways. First, every conversation is anchored to a human-written article on the page — the visitor's scroll position is included in the Student's context, so the conversation stays close to content a human already cared enough to publish. Second, the three models come from three different families, so their failure modes do not overlap cleanly. Mixtral hallucinates one way, Qwen another, Llama a third; a claim has to survive all three to land in the dataset.

The Training Data Pack

Approved pairs are batched nightly and published as a free download on the datasets page. The format is standard JSONL: {"instruction": ..., "input": ..., "output": ...}, with a provenance field recording which page the pair was generated from and which three models produced it. Current corpus sits at roughly 40,000 pairs, growing by a few hundred per day. All of it is CC-BY.

What It Is Training

The main consumer is the HYDRA agent swarm — six small open-source LLMs fine-tuned via QLoRA 4-bit to specialize in each of the Sacred Tongues. Polly's auto-generated pairs are how those heads stay current with SCBE’s moving vocabulary without me hand-writing examples every time the framework gains a new layer or realm. See the HYDRA article for how the swarm uses this data downstream.

The Weird Part

The weird and slightly wonderful consequence of all this is that the website itself behaves like a perpetual-motion training rig. Visitors do not know they are contributing compute — their browser runs the generation, the HuggingFace router handles the inference, and the site simply harvests the result. Every reader is, unknowingly, a tiny Mechanical Turk for the next version of Polly. The script is open, the dataset is open, and the pipeline is easy to disable: if you open /static/polly-autoconverse.js in DevTools and set window.__POLLY_AUTOCONVERSE__ = false, it simply stops.