Experience

Saksham Rathi | CSE, IIT Bombay

Internships

Citadel - Software Engineer Internship

Summer 2025, Commodities, London
- Built a robust listener and parser for trade messages, boosting accuracy via async LLM integration
- Developed a real-time health monitor to validate curve markers, track price source heartbeats and uptime, ensure snapshots, and detect cushion/crossed states with Slack alerts across all markets
- Conducted regression on prelim vs close prices to detect curve/source discrepancies and improve reliability
- Extended trade upload tool with trade and risk date support for accurate tracking and downstream use
Tower Research Capital - Quantitative Developer Internship

Winter 2025, North Moore Trading Team, Gurugram
- Analyzed trade data feeds received from a crypto exchange across multiple IP addresses, identifying and prioritizing better-performing connections to the firm's quote server
- Improved connection reliability by evaluating and comparing latency and stability across exchange IPs, routing traffic through the most consistent endpoints for real-time market data
- Integrated hardware timestamping into OpenSSL connections to achieve higher timing precision for incoming market data packets
BharatGen - Applied Scientist Internship

Summer 2026, Mumbai
Amazon - Applied Scientist Internship

Summer 2024, Bangalore
- Worked on Amazon's Large Language Model Olympus and improved its instruction following ability
- Implemented Classifier-free Guidance method to enhance focus on key parts of user queries and system prompts, optimizing the balance between conditional and unconditional probabilities using a hyper-parameter
- Evaluated the performance of Olympus and some open source models on various single and multi-turn datasets

Research Experience

Bachelor's Thesis Project - Requests of a Feather Must Flock Together: Batch Size vs. Prefix Homogeneity in LLM Inference

2025-26, Prof. Mythili Vutukuru

[arXiv] | [Report] | [Presentation]
- Designed Feather, a prefix-aware LLM inference scheduler using reinforcement learning to balance batch size and prefix homogeneity, achieving 2-10x higher throughput over state-of-the-art baselines (vLLM, SGLang) integrated into both frameworks
- Built the Chunked Hash Tree (CHT), a lightweight data structure replacing expensive radix-tree traversals for prefix detection, reducing CPU scheduling overhead by up to 1000x while keeping it under 1% of GPU execution time
- Showed through experiments that smaller, prefix-homogeneous batches outperform larger heterogeneous ones, and that even a single divergent request can cut decode throughput by ~2×, motivating the core design of Feather
Research and Development Project - Compressive Lognormal Regression

2024-26, Prof. Ajit Rajwade
- Enhanced viral load estimation in pooled RT-PCR via Bayesian inference and compressed sensing
- Developed a superior Orthogonal Matching Pursuit (OMP) variant using combinatorial group testing
- Optimized sparse recovery using block gradient descent, LASSO, NNLS and proximal algorithms