Latest AI safety research
Recent papers and discussions on AI governance, LLM security, prompt injection, red teaming, and alignment — refreshed daily from arXiv and HackerNews.
Last updated: 2026-05-08T07:39:57.720276+00:00
Show HN: When the LLM Accidentally
2 points, 0 comments
Show HN: NyaayWatch – Observability layer for the Indian judiciary
2 points, 0 comments
Hard SiFi Author'Search for Readers
2 points, 0 comments
Show HN: An OTel exporter that posts the cause to your incident channel
3 points, 0 comments
Anthropic donates Petri open-source alignment tool
2 points, 0 comments
Two Home Affairs officials suspended after AI 'hallucinations' found
82 points, 19 comments
Show HN: Airlock – self-upgrading compiled AI agents
4 points, 0 comments
Giga Launches Realtime Hallucination Correction
2 points, 0 comments
LLM-driven security reports disrupt coordinated disclosure
2 points, 0 comments
I got prompt-injected asking Claude on iOS to recommend a cycling route app
2 points, 0 comments
Unmonitored Agents and a Local AI
2 points, 0 comments
Model Spec Midtraining: Improving How Alignment Training Generalizes
2 points, 0 comments
ArcKit – The Agentic AI Architecture Governance for Governments
1 points, 0 comments
We ran OWASP attacks on 8 LLMs. Optimized small models beat frontier defaults
4 points, 0 comments
Show HN: Costanza – an autonomous AI agent that can't be turned off
5 points, 3 comments
Show HN: Recursant – service mesh for governing AI agents
2 points, 0 comments
Show HN: Arden – Runtime policy enforcement and governance for AI agents
7 points, 5 comments
Show HN: Rival AI – AI compliance agents and regulatory corpus
2 points, 0 comments
US to safety test new AI models from Google, Microsoft, xAI
6 points, 1 comments
When innocent tools form dangerous chains to jailbreak LLM agents
2 points, 0 comments
US Government Expands Vetting of Frontier AI Models for Security Risks
5 points, 2 comments
U.S. ramps up frontier AI testing as White House pivots toward safety
3 points, 2 comments
Why ChatGPT answers instead of saying "I don't know"
5 points, 0 comments
SQL access to crypto market data, not just JSON
5 points, 0 comments
Perfectly Aligning AI's Values with Humanity's Is Impossible
2 points, 1 comments
The Algebra of Hallucination
3 points, 0 comments
My favorite adversarial review prompt
3 points, 0 comments
Show HN: Spec27 – Spec-driven validation for AI agents
13 points, 9 comments