AI coding benchmark MirrorCode published its full results June 26, showing Claude Opus 4.7 autonomously rebuilt a 60,000-line interpreter and scored 56% overall — completing tasks that take human ...
For example, a score of 900 means that the mouse spent exactly 50% of its time on each of the two beddings, whereas a score of 1,800 means that it spent the full 30 min in the bedding that would be ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results