New benchmarks show semantic code graphs helping coding agents find change locations faster and complete updates more ...
B, a 3-billion-parameter AI model, is challenging OpenAI, Google and DeepSeek on math and coding benchmarks while reigniting ...
Embodied AI world models drew $6 billion in Q1 2026 alone, but new analysis from Fusion Fund investors argues the LLM scaling ...
Leaderboards tell you which model is best in general. I needed to know which model is best for my system, right now, in five minutes. The Vellum LLM Leaderboard tracks every frontier model across GPQA ...
The IndyCar Series races today at Road America in Elkhart Lake, Wis.: Leaderboard, highlights, crashes.
Microsoft released MAI-Code, a model designed to convert plain-English descriptions into functional application code, pushing the company deeper into the race to build AI agents that can handle real ...
Speedrun.com's official API (aka APIv1) is not actively maintained, and both misses a large number of modern features (including various social connections on user profiles) and several unaddressed ...
This project provides a script tool and a leaderboard for evaluating the SQL capabilities of Large Language Models (LLMs). It aims to assess LLMs' proficiency in SQL understanding, dialect conversion, ...
Miyu Yamashita topped the Meijer LPGA leaderboard on Sunday, climbing the rankings after Lottie Woad lipped out a playoff ...
Their GLM 5.2 model now holds the top spot on the Design Arena HTML leaderboard. It earned an Elo score of 1,360. This puts it slightly ahead of Anthropic's Claude Fable 5, which scored 1,350. This ...
We used the HumanEval leaderboard to filter the best performing models at the time our research started, which you can see in Figure 3. Note that this project began in February of 2024 and was first ...