Formal Approach to Multi-Agent Safety

As autonomous agents increasingly act without human review, their communication protocols become a safety decision, not just an engineering one. Today's LLM agents talk in natural language or loosely structured formats like MCP, and a classical result in computer science (Rice's theorem) says once messages can express anything, no filter can reliably distinguish safe from adversarial inputs — prompt injection and OpenCLAW's security blunders are the predictable result. With my Sydney-based co-author Hugo O'Connor, I'd like to present CBCL (Common Business Communication Language), an agent communication framework whose safety properties are machine-checked by a theorem prover, and use it to ground a broader argument: formally constraining what agents can say to each other is necessary, but not sufficient, for meaningful human oversight of multi-agent systems. CBCL lets agents teach each other new vocabulary at runtime, so structural safety doesn't come at the cost of flexibility. The work is currently under review at NeurIPS, with further empirical results expected between now and July. Estimated duration: 10 minutes including Q&A, could also work as regular presentation.

Speakers

Audience Q&A