About

At the moment I work on the Model Research and Engineering at Dheyo AI.

I make neural networks run fast on hardware. I like problems at the intersection of mathematics, algorithms, models and hardware.

I did my Bachelors in Electrical and Electronics Engineering. During the COVID-19 pandemic, I was looking for opportunities and found myself at MLIR Systems (thanks to Raju Datla and Sree Reddy). The company was later renamed to Deepvision Inc, then Kinara AI, and was eventually acquired by NXP.

At Kinara, my work grew in layers. I started at the bottom: writing optimized kernels for neural network operators on the ARA1 processor, Kinara's first edge AI chip. I covered most major NN operators used in CNN-based architecturesThis was before OpenAI released GPT-3. . As kernel work became repetitive, I moved up the stack. I began exploring quantization (how models are represented and compressed) and, driven by an interest in math, applied numerical methods like piecewise approximations, Newton-Raphson, and Newton methods to improve operator performance. To tie these pieces together, I worked on the AI compiler for Kinara's edge processors under Abhilash Ghanore and Lava Kumar Bokam, which brought kernels, quantization, and optimizations into a single pipeline from model to hardware. During this period, I also enrolled in a Master's program at BITS Pilani.

When LLMs arrived in 2023, Kinara was taping out ARA2, their second-generation chip. I took on the problem of running LLMs on edge, working with Wajahat Quadeer and Rajashekar Reddy on novel quantization, compression, and pruning for SD1.5, SDXL, and LLaMA.

During this work, I invented a new mathematical approximation for the inverse square root that outperformed existing methods in both speed and accuracyNXP (formerly Kinara AI) currently holds the IP for this . Since inverse square root sits at the heart of LayerNorm and RMSNorm, the optimization had a meaningful impact on ARA2's performance for LLMs and diffusion models.

Later, I had the chance to work with Raja Koduri at Oxmiq Labs, a startup he founded. The team was building OxPython, a PyTorch compiler for any hardware, decoupling from CUDA to remove that dependency. I had already built an AI compiler at Kinara, and building another one did not hold the same interest for me. I left Oxmiq to join Dheyo AI as part of the founding team. Dheyo AI was founded by Abhilash Ghanore and Lava Kumar Bokam, who mentored me at Kinara. I work here in Model Research and Engineering, solving problems across determinism in Rectified Flows and world models for frameworks that help robots scale their training in simulated and generated environments.

Outside of work: Running, Cycling, Design.

Find me on GitHub, LinkedIn and X.