cat README.md
# PoPE-ViT A Vision Transformer pipeline for **brain-tumor MRI classification** (glioma, meningioma, pituitary, no-tumor), built to compare **Polar Coordinate Positional Embeddings (PoPE)** against the **Rotary baseline (RoPE)** — and against strong pretrained and convolutional references. ## Why Positional encoding decides how a transformer reasons about *where* a patch is. PoPE explicitly decouples content (magnitude) from position (angle); RoPE entangles them. We ask whether that clean separation actually helps a clinically meaningful task — distinguishing four tumor classes from 2D MRI slices. ## How - **Patch embedding** — 224×224 scans split into 16×16 patches, linearly projected (768 → 512) with a prepended CLS token - **Positional encoding** — PoPE (softplus-zeroed initial angles) vs. RoPE (position-scaled rotation), swappable within the same backbone - **Transformer** — 6 blocks (LayerNorm → attention → MLP, residual), mean-pooled to a 4-class head - **Training** — class-weighted cross-entropy (handles the no-tumor minority), AdamW, cosine schedule with linear warm-up, early stopping - **Evaluation** — per-class one-vs-rest AUROC, accuracy, confusion matrices, plus a hyperparameter grid over learning rate, dropout, and patch size - **Baselines** — DeiT-Small (pretrained & + PoPE) and a from-scratch ResNet18 ## Results ImageNet pretraining dominates at this data scale: ResNet18 (0.999 AUROC) and pretrained DeiT-Small (0.986) lead, while PoPE-ViT and RoPE-ViT come out nearly identical (0.9644 vs. 0.9645) — the PoPE advantage doesn't surface when training from scratch on ~3.3k images. ## Stack - **Models** — PyTorch, custom PoPE/RoPE attention, einops tensor ops - **Backbones** — timm (DeiT-Small), torchvision (ResNet18, transforms) - **Data** — Brain Tumor MRI (Kaggle), stratified 80/10/10 split, RandomAffine + flip augmentation - **Eval** — scikit-learn (AUROC, confusion matrices), matplotlib

ls -l
tech Python, PyTorch, timm, scikit-learn, einops, torchvision