SCBE-AETHERMOORE
Auto-updated feed

Latest AI safety research

Recent papers and discussions on AI governance, LLM security, prompt injection, red teaming, and alignment — refreshed daily from arXiv and HackerNews.

Last updated: 2026-05-08T07:39:57.720276+00:00
HackerNews 2026-05-08 prompt injection

Show HN: When the LLM Accidentally

2 points, 0 comments

lucid-dev
HackerNews 2026-05-08 red team

Show HN: NyaayWatch – Observability layer for the Indian judiciary

2 points, 0 comments

Rudraksh06
HackerNews 2026-05-08 red team

Hard SiFi Author'Search for Readers

2 points, 0 comments

dougcol
HackerNews 2026-05-08 hallucination

Show HN: An OTel exporter that posts the cause to your incident channel

3 points, 0 comments

ahmedmostafa16
HackerNews 2026-05-07 alignment

Anthropic donates Petri open-source alignment tool

2 points, 0 comments

dragonstyle
HackerNews 2026-05-07 hallucination

Two Home Affairs officials suspended after AI 'hallucinations' found

82 points, 19 comments

jruohonen
HackerNews 2026-05-07 LLM security

Show HN: Airlock – self-upgrading compiled AI agents

4 points, 0 comments

cyberteaborg
HackerNews 2026-05-07 hallucination

Giga Launches Realtime Hallucination Correction

2 points, 0 comments

varunvummadi
HackerNews 2026-05-07 LLM security

LLM-driven security reports disrupt coordinated disclosure

2 points, 0 comments

jwilk
HackerNews 2026-05-07 prompt injection

I got prompt-injected asking Claude on iOS to recommend a cycling route app

2 points, 0 comments

menno-sh
HackerNews 2026-05-06 jailbreak

Unmonitored Agents and a Local AI

2 points, 0 comments

jay_kyburz
HackerNews 2026-05-06 alignment

Model Spec Midtraining: Improving How Alignment Training Generalizes

2 points, 0 comments

bearseascape
HackerNews 2026-05-06 AI governance

ArcKit – The Agentic AI Architecture Governance for Governments

1 points, 0 comments

TractorJuice
HackerNews 2026-05-06 LLM security

We ran OWASP attacks on 8 LLMs. Optimized small models beat frontier defaults

4 points, 0 comments

ChanwooSong
HackerNews 2026-05-06 prompt injection

Show HN: Costanza – an autonomous AI agent that can't be turned off

5 points, 3 comments

aruss
HackerNews 2026-05-06 AI compliance

Show HN: Recursant – service mesh for governing AI agents

2 points, 0 comments

hestefisk
HackerNews 2026-05-06 AI governance

Show HN: Arden – Runtime policy enforcement and governance for AI agents

7 points, 5 comments

rishabtandon
HackerNews 2026-05-05 AI compliance

Show HN: Rival AI – AI compliance agents and regulatory corpus

2 points, 0 comments

estradanicolas
HackerNews 2026-05-05 AI safety

US to safety test new AI models from Google, Microsoft, xAI

6 points, 1 comments

devonnull
HackerNews 2026-05-05 jailbreak

When innocent tools form dangerous chains to jailbreak LLM agents

2 points, 0 comments

leecoursey
HackerNews 2026-05-05 AI safety

US Government Expands Vetting of Frontier AI Models for Security Risks

5 points, 2 comments

r3trohack3r
HackerNews 2026-05-05 AI safety

U.S. ramps up frontier AI testing as White House pivots toward safety

3 points, 2 comments

gmays
HackerNews 2026-05-05 adversarial

Why ChatGPT answers instead of saying "I don't know"

5 points, 0 comments

BoundaryTester
HackerNews 2026-05-04 AI governance

SQL access to crypto market data, not just JSON

5 points, 0 comments

knazim
HackerNews 2026-05-04 alignment

Perfectly Aligning AI's Values with Humanity's Is Impossible

2 points, 1 comments

oldnetguy
HackerNews 2026-05-03 AI compliance

The Algebra of Hallucination

3 points, 0 comments

invariantjason
HackerNews 2026-05-03 adversarial

My favorite adversarial review prompt

3 points, 0 comments

tie-in
HackerNews 2026-04-30 adversarial

Show HN: Spec27 – Spec-driven validation for AI agents

13 points, 9 comments

njyx