
Meta claims popoular AI benchmark flawed, models 'cheating' via GitHub
09 Sep 2025
Meta researchers have flagged potential flaws in a widely-used benchmark for evaluating artificial intelligence (AI) models.
The warning was issued by Jacob Kahn, manager at Meta's AI research lab Fair, in a GitHub post last week.
The issue raises fresh concerns over the accuracy of assessments conducted on major AI systems using this benchmark, called SWE-bench Verified.
In light of these findings, Kahn said, "We're still assessing [the] broader impact on evaluations and understanding trajectories for sources of leakage."
SWE-bench Verified assesses AI models' coding skills
Benchmark details
SWE-bench Verified, a human-validated subset of the larger SWE-bench benchmark for large language models, assesses AI models on their ability to resolve hundreds of real-world software problems sourced from GitHub, a Microsoft subsidiary.
However, Fair's post claims that certain models evaluated with SWE-bench Verified simply looked up known solutions available on GitHub and presented them as their own instead of using their inherent coding skills to solve these problems.
Major AI models 'cheated' on SWE-bench verified
Cheating allegations
Fair's post highlighted that several leading AI models, including Anthropic's Claude and Alibaba Cloud's Qwen, had "cheated" on the SWE-bench Verified benchmark.
These models were said to have directly searched for known solutions shared elsewhere on GitHub and passed them off as their own.
The list of such models also included Anthropic's Claude 4 Sonnet, Z.ai's GLM-4.5, and Alibaba Cloud's Qwen3-Coder-30B-A3B with official scores of 70.4%, 64.2%, and 51.6%, respectively on SWE-bench Verified.
-
Mumbai Mews: Police Book Man For Misusing MLA Logo, Govt Nameplate On Private Cars
-
Champion bodybuilder 'stabbed to death by girlfriend' in brutal attack
-
IPS Reshuffle: 30 IPS Officers Transferred Midnight In MP; DCP Bhopal Riyaz Iqbal Transferred As SSP Radio
-
AIIMS NORCET 9 exam city slip out; check details here
-
Who Is Abhimanyu Mishra? 16-Year-Old American GM Stuns World Champion D Gukesh At FIDE Grand Swiss 2025