I design and build end-to-end Machine Learning systems, combining distributed model training with high-throughput, low-latency semantic retrieval pipelines.
I bridge the gap between academic Machine Learning research and high-performance product deployments.
My core expertise lies in designing scalable LLM pipelines, distributed model training configurations, and low-latency retrieval infrastructures. Whether optimizing model serving configurations to reduce token latency or designing highly scalable contrastive embedding adapters, I focus on building reliable systems with concrete metric targets.
Practical metric baselines achieved across real-world model training and low-latency serving pipelines.
CineSeek combines LLM-based query expansion and rewriting, FAISS-based high-performance ANN retrieval, and an agentic cross-encoder reranker. It was built specifically to solve complex, long-tail queries without sacrificing production latency bounds.
First-author papers in TACL, NAACL, AAAI, and Expert Systems with Applications, bridging representation learning with information retrieval.
Explore the full dataset & citations on Google Scholar β