home projects contact reviews blog

Blog Posts

Posts per page:

Creating Custom LLMs with Hugging Face

4/2/2025

Creating our own llm and integrating it with Hugging Face.

PyTorchHugginFaceLLM

RoPE Kernel Optimization in CUDA

2/4/2025

Introduction and implementation for optimizing RoPE kernel in CUDA.

CUDARoPEOptimization

CUDA Softmax Kernel

1/27/2025

SoftMax Kernel in CUDA beats PyTorch

CUDASoftMaxOptimization

CUDA Flash Attention Kernel

1/24/2025

Forward pass for Flash Attention

CUDAAttentionOptimization

RWKV models explained

12/8/2024

Math and Code for RWKV 5.2 . More to come

RWKVModel ExplanationMath

Tags