Tests that once challenged advanced AI models are now being solved with ease, making it harder for researchers to pinpoint what current systems are actually capable of.
Researchers debut "Humanity’s Last Exam," a benchmark of 2,500 expert-level questions that current AI models are failing.
Forbes contributors publish independent expert analyses and insights. I write about relationships, personality, and everyday psychology. For decades, intelligence has been often reduced to a number — ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results