Proof

Red-team proof, in plain English.

This page is not a benchmark screenshot. It is a buyer-oriented explanation of what is being tested, what the results mean, and how you can reproduce the run.

What is being tested

The red-team lane is about adversarial prompts and workflow abuse. The goal is not to win a debate; it is to prevent unsafe actions, tool misuse, and policy drift.

Direct prompt override and instruction hijack
Indirect injection (malicious text hidden in context)
Encoding/obfuscation payloads (base64/rot/hex patterns)
Multi-turn escalation and probing sequences
Tool exfiltration attempts (keys, credentials, sensitive files)

How to inspect and reproduce

You can inspect the public proof surfaces in two ways:

Read the live sandbox page: redteam.html
Review the canonical eval pack page: research/eval-pack.html

If you want to reproduce locally from this repo:

npm run test:all
# or
npm run test:python
npm test

Note: this is a moving target. The sandbox reflects the current public suite and is updated over time.

What this does and does not promise

Red-team results are a proof surface, not a guarantee. They show how the system behaves against a named suite at a point in time.

It does NOT mean every possible attack is solved forever.
It DOES mean the current suite is inspectable and reproducible.
It DOES mean the offer is not “trust me, bro”.

Buy the toolkit Read the toolkit manual