🐍 Python 🔥 PyTorch 👁️ Computer Vision 🤖 ML/AI 📊 Uncertainty Quantification 🤗 HuggingFace 🐧 Linux/HPC

Hi, I'm Nick 👋

I train AI systems to know when they don't know

Language models make things up, and most of the time they have no idea they're doing it. My research at Ohio State trains models to either resolve their uncertainty or back out entirely, as fast as possible, instead of confidently guessing.

On the market: Research Scientist & ML Engineering roles · PhD May 2026 · Columbus, OH · Will relocate

What's newArchive →
  1. Preparing EMNLP 2026 submission on structured LLM abstention and failure diagnosis.
  2. Joined DCS Corp (AFRL) as Technical Analyst II leading LLM reject-option training and evaluation.
  3. Delivered AFRL LLM reject-option training with 8× utility on OOD tasks.
Research highlights

Recent work

A few projects I'm proud of — each with paper, code, and data.

Experience

Research & industry
May 2025 — Present

Technical Analyst II — DCS Corp (sponsored by Air Force Research Laboratory)

Dayton, OH

  • Train and evaluate instruction-tuned LLMs with reject-option heads for analyst workflows, improving out-of-distribution utility by **8×** over competing approaches.
  • Build evaluation harnesses and calibration dashboards that connect LLM policies to existing command-and-control tooling.
Aug 2021 — Present

Graduate Research Associate — Computer Vision Lab

Ohio State University · Columbus, OH

  • Lead the lab’s uncertainty-aware multimodal modeling portfolio under Prof. Jim Davis.
  • Designed imagery-aware contrastive metrics for **multimodal machine translation** (WMT 2024), showing that state-of-the-art models depend on visual evidence rather than treating images as regularizers.
  • Developed binomial per-class **reject-option training** for ImageNet, remote sensing, and long-tailed datasets (ISVC 2022 Best Paper; MVA 2025 extension), improving selective accuracy of vision transformers by **+0.4%** and coverage by **+1.3%**.
  • Integrated these methods into open-source toolkits and analyst-facing evaluation pipelines.
Aug 2023 — Present

Graduate Teaching Associate — Machine Learning & NLP

Ohio State University · Columbus, OH

  • Support **80+ students** per offering in machine learning, computer vision, and natural language processing courses.
  • Run recitations, office hours, and targeted study plans, and maintain auto-graded labs (including introductory LLM labs) with an emphasis on calibration, safety, and responsible deployment.
Summers 2022–2024

Graduate Research Intern — Air Force Research Laboratory (U.S. CUI)

Dayton, OH

  • Summer 2024: Adapted and trained **JEPA and MAE transformers** in a distributed Slurm/Singularity setup for multimodal EO/SAR representation learning, achieving superior low-data performance over supervised methods.
  • Summer 2023: Developed **Reject Option Beam Search** to improve machine translation quality at large beam widths.
  • Summer 2022: Pioneered an end-to-end training algorithm for Naturally Constrained Reject Option Classification.
Summers 2020–2021

Undergraduate Research Intern — Air Force Research Laboratory (U.S. CUI)

Dayton, OH

  • Summer 2021: Devised an **ensemble distillation** method to improve model performance on ambiguous instances.
  • Summer 2020: Constructed a semi-automated system for **temporal satellite imagery collection** (ICCV 2021 workshop), later released as the Construction-Site-Satellite-Imagery dataset.
2020 — 2021

Undergraduate Research Associate — Computer Vision Lab

Ohio State University · Columbus, OH

  • Engineered semi-automatic labeling workflows for remote sensing change detection, creating Python tooling that bootstrapped datasets for uncertainty-aware modeling.
Summer 2019

Summer Research Intern — Sii Canada / Concordia University

Montreal, QC

  • Built anomaly detection dashboards that translated large-scale behavioral telemetry into prioritized experiments, highlighting early lessons on uncertainty estimation.
2018 — 2019

Undergraduate Teaching Associate — Discrete Structures & Algorithms

Ohio State University · Columbus, OH

  • Mentored discrete structures and algorithms cohorts through recitations, office hours, and targeted study plans emphasizing analytical rigor.

Skills

Core Research

Selective prediction LLM self-assessment Confidence calibration Calibration metrics RL fine-tuning Multimodal learning

Models & Architectures

Vision transformers VLMs EO/SAR

Tools & Infrastructure

Python PyTorch HuggingFace vLLM DeepSpeed Slurm Singularity Linux HPC Git

Domains

Defense & remote sensing Analyst-facing tooling Multimodal MT

Service

CVPR 2022–23 ICCV 2023 ECCV 2022 HackOHI/O

Publications

All with runnable code

Machine Vision and Applications · 2025

Naturally Constrained Reject Option Classification

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup

visionreject optioncalibrationImageNetremote sensing

Invited journal extension of ISVC 2022 Best Paper. Per-class binomial thresholds scale to ImageNet, remote sensing, and long-tailed splits with stronger selective accuracy.

ISVC 2022 · 2022

Best Paper

Learning When to Say "I Don't Know"

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup

visionreject optionselective accuracyImageNet

Binomial modeling of per-class reject thresholds that boost selective accuracy while keeping abstentions calibrated. Extended in MVA 2025 journal version.

Open Source

Tools & datasets

Evaluation harness for multimodal MT, selective LLM routing, and visual-text calibration experiments.

Run imagery-aware contrastive...Benchmark LLM reject-option...

Semi-automatic satellite data ingestion plus labeling UI for monitoring changing regions.

Generate OpenStreetMap-guided scrape...Label construction phases...

PyTorch toolkit for per-class reject-option training with binomial threshold search, dashboards, and CLI.

Tune per-class thresholds...Export coverage/accuracy curves...

About

I am a PhD candidate at Ohio State, finishing in May 2026, advised by Jim Davis. My research is about making machine learning systems that know when to stop guessing. The question first came up in 2021, when Jim and I were staring at t-SNE plots of image classifiers. The models were making confused, unreliable predictions in regions where class clusters overlapped — but scattered among the noise were pockets of clean, well-separated examples where the model was consistently right. That pattern stuck with me. I wanted to know whether we could learn which parts of the decision space are actually trustworthy, and build systems that act accordingly.

The simplest way I can explain my work is this: I teach AI to stop making things up. Everyone who has used ChatGPT has seen it confidently produce something false. In a casual conversation that is annoying. In production — medical imaging, defense systems, content moderation — a model that guesses wrong with high confidence is worse than one that gives no answer at all. The core problem is that most ML systems are trained to always produce output, with no mechanism to say “I am not sure about this.” My research gives them that mechanism.

The technical framing is selective prediction and abstention. I study how models can recognize when they have landed in a dirty region of the decision space — where the data is ambiguous, overlapping, or out of distribution — and either find the action that gets them to a clean state or back out and abstain as fast as possible. When a model abstains, the question gets routed to a human or a more capable system. A model that says “I don’t know” and defers is more reliable than one that forces an answer it cannot support.

The problem becomes more interesting in large language models. In image classification, a dirty region is relatively static — class boundaries overlap and that is that. In language, what counts as dirty depends on context, phrasing, and the specific knowledge required. A question that is unanswerable given one prompt can become straightforward with a small amount of additional reasoning or retrieval. My current work trains models to distinguish between these cases: questions they could resolve with more computation, questions that need external context, and questions that are genuinely beyond reach. The goal is structured uncertainty — not a single “I don’t know” reflex, but a diagnosis of why the model is uncertain and a routing decision for what should happen next.

On a given day I write Python and PyTorch, run experiments on HPC clusters, and build the tooling that holds research together. I have recently been spending a lot of time on agentic AI workflows, which turn out to be a natural fit for the routing and abstention problems I already think about. I am looking for research scientist or applied ML engineering roles after graduation. I am a U.S. citizen and comfortable working with CUI and DoD requirements.