Large Language Models Benchmarks

Hosted on MSN

New study challenges accuracy of AI benchmark testing

A Nature-published study by an international research team has found that current AI benchmarks fail to accurately measure large language models’ core capabilities. Existing tests often mix skills ...

DeepSeek open-sources V4 large language model series

Chinese artificial intelligence developer DeepSeek today released a new series of open-source large language models. V4, as ...

3don MSN

DeepSeek previews new AI model that ‘closes the gap’ with frontier models

DeepSeek says both models are more efficient and performant than DeepSeek V3.2 due to architectural improvements, and have ...

iAfrica

Egyptian Startup Releases Open-Source AI Model That Outperforms Larger Global Rivals on Key Benchmarks

A Cairo-based artificial intelligence startup has released Horus 1.0-4B, a fully open-source large language model built in Egypt that outperforms several ...

STAT

OpenAI leaps into health care with AI benchmark to evaluate models

OpenAI on Monday released a large dataset for evaluating how well large language models answer questions related to health care. Experts lauded the open-source data and detailed evaluation rubrics, ...

Hosted on MSN

Open-weight AI models from China challenge industry leaders

Chinese AI labs are releasing open-weight large language models that rival or surpass leading proprietary systems on key coding benchmarks. Models like Z.ai’s GLM-5.1 and Moonshot AI’s Kimi K2.6 are ...

Moonshot AI releases Kimi-K2.6 model with 1T parameters, attention optimizations

Moonshot AI today released Kimi-K2.6, the latest addition to its popular Kimi series of open-source large language models.

VentureBeat

Researchers warn of 'catastrophic overtraining' in LLMs

A new academic study challenges a core assumption in developing large language models (LLMs), warning that more pre-training data may not always lead to better models. Researchers from some of the ...

Bloomberg L.P.

Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance

NEW YORK – Bloomberg today released a research paper detailing the development of BloombergGPT TM, a new large-scale generative artificial intelligence (AI) model. This large language model (LLM) has ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results