What Makes a Great Language Model?
*Isn’t it surprising…?*
…that training a language model on the web’s data works? Isn’t there a lot of rubbish on the web? How does the model know what information to focus on?
“Correct” or “high quality” data has a tendency to repeat itself. There might be some data saying that the sun moves around the earth, but there is much more saying that the earth moves around the sun. This makes it possible for us to train language models on large datasets, even if they contain some, or even a lot, of weak information or arguments. Weak arguments and information tend to differ from each other, whereas stronger arguments or information tends to be articulated, transmitted and replicated more coherently…