BERT
Embeddings¶
- Word2Vec
ELMo: Context Matters¶
- words have different meaning depending on the context
- "stick": "let's stick to the plan" vs "i used a walking stick"
- Use of bidirectional LSTM to look at the whole context
- Trained on predicting the next word
ULM-FiT¶
- Utilize a lot of what model learns during pretraining
- Introduces a way to do transfer learning
OpenAI Transformer¶
Decoder model for language modeling - Stack 12 decoders and throw 7000 books at them - books good, long context - Trained using next word prediction, forward only language model - Issue: Only context from one side - We need to mask the next token as to not let embedding leak in
BERT¶
Encoder model for language modeling
Masked Language model¶
Two sentence Task¶
From this task, it is assumed that the bert model learns to encapsulate the entire information of a sentence in the [CLS]
token
BERT On different Tasks¶
BERT as a feature extractor¶