AI Safety Forum Australia
Back to program
talk

From Outputs to Systems: Evaluating Agentic AI in Practice

7 July 2026 · 11:30 am–11:55 am · Sutherland

Most AI evaluation still focuses on single outputs: a prompt goes in, a response comes out, and the response is scored. That is useful, but it breaks down once systems act over time and interact with tools, memory, and other systems. Agentic systems carry context forward and trigger downstream effects. The risk is not only in what they say, but in how behaviour unfolds across steps. This talk introduces a system-level approach to evaluating agentic AI in deployment. It shifts the focus from outputs to trajectories: how behaviour develops, where it drifts, how errors propagate, and what organisations can start measuring in practice. The aim is to make evaluation a usable layer of governance for systems that act, not just respond.

Speaker