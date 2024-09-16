A team of technology experts issued a global call on Monday, seeking the toughest questions to challenge artificial intelligence systems, which have been acing popular tests with ease.

Termed 'Humanity's Last Exam,' this initiative aims to determine the arrival of expert-level AI while maintaining relevance with future advancements. The project is helmed by the non-profit Center for AI Safety (CAIS) and the startup Scale AI. This comes in the wake of ChatGPT's new model, OpenAI o1, which exceeded popular benchmarks, according to Dan Hendrycks, executive director of CAIS and an advisor to Elon Musk's startup xAI.

Hendrycks co-authored influential papers in 2021 proposing AI tests, which are now widely used. One such test covers undergraduate-level topics, while another assesses reasoning through complex math problems. AI has vastly improved since, as models like Anthropic's Claude demonstrated remarkable progression in benchmark scores from 77% to 89% in a year.

The initiative highlights the need for tougher evaluations as AI systems have shown poor performance on lesser-known tests involving plan formulation and visual pattern recognition. 'Humanity's Last Exam' aims to include 1,000 crowd-sourced questions, reviewed by peers, with top submissions rewarded in November. This effort seeks to better measure AI's rapid advancements and ensure integrity in its testing processes.

(With inputs from agencies.)