9/9/2024
Introduction and implementation for LoRA in PyTorch.
9/10/2024
Math + implementation of RoPE in PyTorch
10/1/2024
Math explained for scaled dot product to RWKVv7
12/8/2024
Math and Code for RWKV 5.2 . More to come
1/24/2025
Forward pass for Flash Attention
1/27/2025
SoftMax Kernel in CUDA beats PyTorch
2/4/2025
Introduction and implementation for optimizing RoPE kernel in CUDA.