A fresh set of benchmarks is needed to assess artificial intelligence's understanding of the real world.
Artificial intelligence (AI) models have shown impressive performance on law exams, answering multiple-choice, short-answer, and essay questions as well as humans [1].
However, they struggle with real-world legal tasks, as some lawyers have learned the hard way, facing fines for filing AI-generated court briefs that misrepresent principles of law and cite non-existent cases.
AI models can perform as well as humans on law exams, but they struggle to perform real-world legal tasks.
Author: Chaudhri, principal scientist at Knowledge Systems Research in Sunnyvale, California.
Author summary: New benchmarks are needed to understand AI's real-world knowledge.