My research is on selective prediction, calibration, and abstention: decision rules that determine when a model should return a prediction, request more evidence, or withhold an answer. I am interested in settings where average accuracy is not enough because the system also needs a calibrated policy for coverage, routing, and failure modes.
I am a PhD candidate at Ohio State, graduating August 2026, advised by Jim Davis. I am first author on my public research papers. The technical thread starts with per-class reject thresholds for image classifiers (ISVC 2022 Best Paper, MVA 2025 journal extension), extends to contrastive evaluation for multimodal machine translation (WMT 2024), and now focuses on adaptive LLM question answering, where uncertainty depends on both the question and the capabilities of the model–retriever–corpus stack.
My current submission studies the retrieve-versus-abstain boundary after direct answering has already been ruled out. The result suggests that answer confidence and stack-relative recoverability should be estimated as separate routing signals.
Implementation work includes Python/PyTorch training code, evaluation harnesses, calibration dashboards, and distributed experiment runs on Slurm/Singularity clusters. Recent work has focused on retrieval-augmented QA, abstention-augmented LLM policies, and evaluation pipelines that expose coverage, recoverability, and out-of-distribution behavior.
I am looking for Applied Scientist and ML Engineer roles starting August 2026, after my defense. I am strongest on teams working on LLM evaluation, calibration, retrieval-augmented systems, selective prediction, or safety/reliability infrastructure. Based in Columbus, OH, open to relocation and remote. U.S. citizen with five summers of AFRL cleared experience; cleared and federal roles are welcome.