Skip to content

DL notes

Next Sentence Prediction

Next Sentence Prediction¶

Say you are trying to predict using next word prediction in a wiki article about a person X.
Now you will need to know about when person born, what they did, and so on.
For learning relationship between sentences, using sentences from same document and different documents

Masked Language Model¶

Mask k% of input words This is a [MASK] sentence. This is a bad [MASK].
Issue
- Low Number of predictions per sentence
- Too little masking: too expensive to train
- Too much masking: not enough context

Masked Token Prediction:¶

Lower number of predictions compared to next word prediction

Next word Prediction¶

Only uses left context(or right context) but Language understanding is bidirectional
Issue: words can "see themselves" in a bidirectional context
Masked LM takes longer to converge but does much better pretty fast
At the very beginning, left-to-right does better, probably because more data