I study practical machine learning: making models behave, scale, and hold up in the wild.
ReCollect: Learned Garbage Collection for Python
A reinforcement learning approach that replaces static GC heuristics in Python runtimes with learned control policies, achieving up to 200% throughput improvement on amenable workloads.
Optimizations for Matching Algorithms in Generative Graph Models
Improving scalability of Graph VAEs by replacing brute-force matching with binning strategies based on node degree and community structure, achieving 20x speedup.
Mitigating Reward Hacking With Vision-Language Models as Rewards
Detecting reward hacking via Jensen-Shannon Divergence and applying VLM-based regularization to steer RL policies toward human-aligned objectives on MuJoCo robotic control tasks.
Differentiable Permutation Self-Consistency
Mitigating positional bias in LLM-based re-rankers via a differentiable relaxation of permutation self-consistency, enabling training-time optimization for information retrieval.
ToT-UI & OneToT: Practical Tools for Tree-of-Thought Reasoning
A web interface for visualizing and debugging Tree-of-Thought prompting workflows, alongside a single-pass variant of ToT that reduces token usage by up to 79% without significant accuracy loss.