Custom attention models are less performant

ReasoningCheckpoint·arcadia

Custom attention architectures, as used by Rijal et al. (2025), lack elements of the canonical transformer architecture, limiting their performance on the yeast dataset.

Confidence