Proof
Red-team proof, in plain English.
This page is not a benchmark screenshot. It is a buyer-oriented explanation of what is being tested, what the results mean, and how you can reproduce the run.
What is being tested
The red-team lane is about adversarial prompts and workflow abuse. The goal is not to win a debate; it is to prevent unsafe actions, tool misuse, and policy drift.
- Direct prompt override and instruction hijack
- Indirect injection (malicious text hidden in context)
- Encoding/obfuscation payloads (base64/rot/hex patterns)
- Multi-turn escalation and probing sequences
- Tool exfiltration attempts (keys, credentials, sensitive files)
How to inspect and reproduce
You can inspect the public proof surfaces in two ways:
- Read the live sandbox page: redteam.html
- Review the canonical eval pack page: research/eval-pack.html
If you want to reproduce locally from this repo:
npm run test:all
# or
npm run test:python
npm test
Note: this is a moving target. The sandbox reflects the current public suite and is updated over time.
What this does and does not promise
Red-team results are a proof surface, not a guarantee. They show how the system behaves against a named suite at a point in time.
- It does NOT mean every possible attack is solved forever.
- It DOES mean the current suite is inspectable and reproducible.
- It DOES mean the offer is not “trust me, bro”.