AI coding benchmark MirrorCode published its full results June 26, showing Claude Opus 4.7 autonomously rebuilt a 60,000-line interpreter and scored 56% overall — completing tasks that take human ...
For example, a score of 900 means that the mouse spent exactly 50% of its time on each of the two beddings, whereas a score of 1,800 means that it spent the full 30 min in the bedding that would be ...