Sparse autoencoders find interpretable features

ReasoningCheckpoint·arcadia

Sparse autoencoders identify highly interpretable features in language and biological models, helping elucidate underlying biological signals.

Confidence