workshopGovernance, law & policyRisk management practices

From Outputs to Systems: Evaluating Agentic AI in Practice

7 July 2026 · 11:00 am–11:40 am · Sutherland

Most AI evaluation still focuses on single outputs: a prompt goes in, a response comes out, and the response is scored. That is useful, but it breaks down once systems act over time and interact with tools, memory, and other systems. Agentic systems carry context forward and trigger downstream effects. The risk is not only in what they say, but in how behaviour unfolds across steps. This talk introduces a system-level approach to evaluating agentic AI in deployment. It shifts the focus from outputs to trajectories: how behaviour develops, where it drifts, how errors propagate, and what organisations can start measuring in practice. The aim is to make evaluation a usable layer of governance for systems that act, not just respond.

Speaker

Rebecca Johnson
Researcher, The University of Sydney
Researcher and advisor specialising in the evaluation and governance of generative and agentic AI systems. Holds a PhD in AI Ethics from the University of Sydney. Founder of Ethics Gen AI. Formerly with Google Research, working on frontier models. Advises organisations on AI risk, deployment, and oversight, and convenes cross-sector forums across industry, government, and academia.

Audience Q&A

Ask a question or upvote others.

Loading questions…

View full program