Obsidian Notes¶
Hello there, we got some notes here. Get started with ViT
Topics¶
Learning Resources of Research Interns Architectures
CNNs
RNNs (mostly LSTMs)
Why is it almost replaced by transformers?
Which cases do RNNs work better?
Transformers
Encoder-only variants (mainly used)
Some recent advancements in decoder-only models or encoder-decoder models
ViT and its variants like Swin (Still transformers, but transferred to CV from NLP)
Learning Paradigms
Supervised
Semi-supervised
Self-supervised
Unsupervised
Differences and use cases for each
We mainly work on supervised learning. But some work ahead needs the knowledge of semi-supervised and self-supervised learning.
Domains
NLP - Language Modelling - Next Word Prediction vs Masked Token Prediction
- NER
- Subword tokenization challenges: how to combine the predictions?
- POS Tagging
- Sentiment Analysis (esp. Target-aspect-based sentiment analysis; TABSA; some effort needed in this)
Computer Vision (esp. Medical Imaging) - Segmentation - ViT - SWIN
- Object Detection
- Use pretrained models on each of the above tasks
Multimodal Learning (Vision and Language; VLM; VLSM) - CLIP and its variants (OpenCLIP, MetaCLIP) (VLM) - CRIS
VLSMs
CLIPSeg
ZegCLIP
Include others