Featured posts Research 04.27.25 The RoPE Compatibility Problem in DeepSeek's Multi Head Latent Attention Read More → Analysis 01.01.25 Analysis of Matrix Multiplications in Transformer Architectures Read More → Less readworthy posts Balancing Memory & Compute: Strategies to Manage KV Cache in LLMs May 27, 2024 Layer Normalization as a Projection: The Complete Geometric Interpretation May 17, 2025