I build machine learning systems that know when to stop guessing. In production — medical imaging, defense, content moderation, agentic LLM stacks — a model that answers confidently and wrong is worse than a model that says “not this one, route somewhere else.” Most ML systems have no mechanism for that. My work adds one.
I am a PhD candidate at Ohio State, graduating August 2026, advised by Jim Davis. I am first author on all five of my publications. The technical through-line is selective prediction and abstention: I started with per-class reject thresholds for image classifiers (ISVC 2022 Best Paper, MVA 2025 journal extension), carried the idea into multimodal machine translation to test whether MT systems actually use visual evidence or just treat images as a regularizer (WMT 2024), and moved into LLM-based QA, where what counts as “uncertain” depends on context, phrasing, and whether retrieval can rescue the question.
My EMNLP 2026 submission is about that retrieve-versus-abstain boundary. Over 41k QA instances in a fixed model–retriever–corpus stack, a small class-weighted question-only controller reaches .68 Recoverability AUC with .49 Reject Recall. Cheap logprob baselines top out at .55 AUC and never reject. The paper’s point is not a new model — it is that recoverability and answer confidence are genuinely different signals, and routing systems should learn them separately.
Day to day I write Python and PyTorch, run experiments on HPC clusters, build evaluation harnesses, and ship the tooling that holds research together. I have spent a lot of the last year on agentic LLM workflows and retrieval-augmented systems, which turn out to be a natural fit for the routing and abstention problems I already think about.
I am looking for applied ML or ML engineering roles starting fall 2026, after an August defense. Research-adjacent roles welcome. I am based in Columbus, OH, open to relocation and remote. I am a U.S. citizen with five summers of AFRL CUI experience and I am comfortable in DoD environments.