A deep dive into why naively combining Rotary Position Embeddings with compressed KV caches fails mathematically, and how DeepSeek's decoupled RoPE strategy elegantly solves the non-commutativity problem by separating content and position paths.
varmology
I write about my work and research, though I post infrequently. If you find these useful and want to connect, reach out on X or LinkedIn.
-
April 27, 2025
-
January 1, 2025
Performance analysis of transformer computations on H100 architecture, with roofline model examination and optimization strategies.
Less Readworthy Posts
-
May 27, 2024
Strategies to reduce KV cache memory in LLMs.
-
May 17, 2025
Layer normalization as geometric projections.