Auto-updated feed

Latest AI safety research

Recent papers and discussions on AI governance, LLM security, prompt injection, red teaming, and alignment — refreshed daily from arXiv and HackerNews.

Latest AI safety research

Show HN: When the LLM Accidentally

Show HN: NyaayWatch – Observability layer for the Indian judiciary

Hard SiFi Author'Search for Readers

Show HN: An OTel exporter that posts the cause to your incident channel

Anthropic donates Petri open-source alignment tool

Two Home Affairs officials suspended after AI 'hallucinations' found

Show HN: Airlock – self-upgrading compiled AI agents

Giga Launches Realtime Hallucination Correction

LLM-driven security reports disrupt coordinated disclosure

I got prompt-injected asking Claude on iOS to recommend a cycling route app

Unmonitored Agents and a Local AI

Model Spec Midtraining: Improving How Alignment Training Generalizes

ArcKit – The Agentic AI Architecture Governance for Governments

We ran OWASP attacks on 8 LLMs. Optimized small models beat frontier defaults

Show HN: Costanza – an autonomous AI agent that can't be turned off

Show HN: Recursant – service mesh for governing AI agents

Show HN: Arden – Runtime policy enforcement and governance for AI agents

Show HN: Rival AI – AI compliance agents and regulatory corpus

US to safety test new AI models from Google, Microsoft, xAI

When innocent tools form dangerous chains to jailbreak LLM agents

US Government Expands Vetting of Frontier AI Models for Security Risks

U.S. ramps up frontier AI testing as White House pivots toward safety

Why ChatGPT answers instead of saying "I don't know"

SQL access to crypto market data, not just JSON

Perfectly Aligning AI's Values with Humanity's Is Impossible

The Algebra of Hallucination

My favorite adversarial review prompt

Show HN: Spec27 – Spec-driven validation for AI agents