A downloadable project

How would a transformer know when to use ‘an’ rather than ‘a’? The autoregressive nature of the transformer means that it is only capable of predicting one token at a time, yet the choice between the words depends on the subsequent word. We use an IOI-inspired prompt with activation patching to isolate and identify a part of GPT-2 Large — a single MLP neuron — that is largely responsible for predicting the token ‘ an’. When we patch the activation of each neuron in turn we can identify which one is responsible for increasing the logit of this token. We present our method and further evidence in this paper.

Download

Download
a(n)-2.pdf
Download
a_an_investigation.ipynb 1 MB

Leave a comment

Log in with itch.io to leave a comment.