Some mash of semi supervised learning, transfer learning
We have a
Small dataset: ImageNet with labels
Much larger dataset: Internet unlabeled
We wish to somehow use the much larger dataset.
The unique approach is that instead of training on the unlabeled dataset for transfer learning, we use the labeled dataset and a teacher model
Teacher: Train a model on ImageNet and use it to label a large amount of unlabeled data
Student: Train on the new labeled data
One would think Student shouldn't be able to outperform the teacher, but in this, the student outperforms the teacher
Teacher labels can be hard or soft pseudo labels. (Soft work better)
Larger Teacher -> better
A ton of data required
Increase batch size of the student
Teacher: gives pseudo labels to un-augmented images, Student: trained on augmented images
Filter images the teacher model has low confidence on (probably they are out of domain images)
Balance the unlabeled images class wise, so that they match the original distribution. (dublication: upsampling /talking images with higher confidence: downsampling)
Iterative training
Joint Training on labeled and unlabeled data.
Training student from scratch maybe better
All this introduces robustness to pertubations on the input images, better ability to tackle adversarial input