Pavani Majety Link to heading
Hi! I work as a Senior Deep Learning Engineer for Inference Frameworks and Compilers at NVIDIA. I primarily contribute to the inference framework vLLM and the newly released compiler MLIR-TensorRT. I have previously worked at Mathworks as a Compiler Engineer on the Embedded Coder team. I also did a short stint at Trane as a Software Engineer long time ago.
I work on libraries that ensure that LLMs like GPTs, Llamas and Mixtrals give you the best perf when you run on NVIDIA GPUs from any inference framework. I use techniques like Mixed Precision Post Training Quantizations, integrating highly optimized attention implementations like Flash Attention, Flashinfer, etc, and write some CUDA code when required! A major part of ensuring that the perf is great, is to measure perf. I use tools like lm_eval, MLPerf, NSight Systems and NVBench for measuring various aspects of performance of a given model. For less mature spaces like DL Compilers, I also develop and use in-house tools for micro-benchmarking.
I also enjoy developing compiler based solutions for compute and memory bound problems with Deep Learning Inference to alleviate the pain of hand optimizing GPU Kernels and writing application specific code on the framework side. With established compiler frameworks like MLIR, it has never been easier to tap into performance acceleration all while giving equal importance to scalability and portability.
I also like experimenting with Natural Language Processing, and Graph Machine Learning for a range of silly to hardcore engineering problems.
I love Taekwondo and it has become an integral part of who I am and how I present myself. I strive to abide by the tenets of Taekwondo: Courtesy(Ye Ui), Integrity(Yom Chi), Perseverance(In Nae), Self Control(Guk Gi) and Indomitable Spirit(Baekjul Bulgul) in all endeavors of my life inside and outside of a dojang.
Check out my GitHub contributions and projects!
Last Updated: Nov 24, 2024