ReasoningCheckpoint·arcadia
Elo score benchmark reflects expected similarity
The use of 1500 as a random performance benchmark for Elo scores enables interpretation of species' scores as reflecting greater or lesser similarity to humans.
Confidence
70%
◑partialactive