AI Safety Forum Australia
talkTechnical safety & evaluationReliability challenges (incl. agents)

Incoherent Values? Probing LLM Preferences Through Parametric Variation

7 July 2026 · 03:30 pm–03:55 pm · Sutherland

The International AI Safety Report 2025 defines ‘reliability’ as an AI system's consistent ability to perform its intended function. A persistent obstacle to that reliability is ‘weak common sense’ reasoning. Normative competence, the ability to identify and reason over moral, legal, social, and professional norms, offers one of the most promising frameworks for diagnosing and closing this gap. One key dimension of normative competence is coherence: the degree to which an agent's judgements fit together. This is especially crucial for agents with whom humans need to cooperate in the real world (since incoherent agents are unpredictable), and who are going to face novel situations outside of the distribution on which they were trained. This is why measuring model coherence requires both technical and philosophical innovation. This session presents our recent work in testing LLM’s evaluative coherence, and shares findings from across 15 different LLMs. This session will demonstrate our approach towards LLM evaluations and testing by focusing on eliciting LLM's moral reasoning power, which is a dimension not that widely explored at present.

Speaker