🐍 Python 🔥 PyTorch 👁️ Computer Vision 🤖 ML/AI 📊 Uncertainty Quantification 🤗 HuggingFace 🐧 Linux/HPC

Hi, I'm Nick 👋

I train AI systems to know when they don't know

Language models make things up, and most of the time they have no idea they're doing it. My research at Ohio State trains models to either resolve their uncertainty or back out entirely, as fast as possible, instead of confidently guessing.

On the market: Research Scientist & ML Engineering roles · PhD defense 2026 · Columbus, OH · Will relocate

View Resume (PDF) Discuss Opportunities

Feb 1, 2026Preparing EMNLP 2026 submission on typed abstention for LLM agents.
May 20, 2025Joined DCS Corp (AFRL) as Technical Analyst II leading LLM reject-option training and evaluation.
May 15, 2025Completed LLM reject-option training and evaluation work at DCS Corp / AFRL.

Research highlights

Recent work

A few projects I'm proud of — each with paper, code, and data.

Machine Vision and Applications2025

Naturally Constrained Reject Option Classification

Invited journal extension of ISVC 2022 Best Paper. Per-class binomial thresholds scale to ImageNet, remote sensing, and long-tailed splits with stronger selective accuracy.

+1.3% coverage

Paper Code

WMT 20242024

Assessing the Role of Imagery in Multimodal Machine Translation

Contrastive evaluation shows SOTA multimodal MT models leverage pixels beyond a regularization effect.

+7% image-grounding score

Paper Code Data

May 2025 — Present

Technical Analyst II — DCS Corp (sponsored by Air Force Research Laboratory)

Dayton, OH

Trained abstention-augmented LLMs optimizing for downstream utility, achieving **8× improvement** in out-of-distribution settings over competing approaches.
Build evaluation harnesses and calibration dashboards that connect LLM policies to existing command-and-control tooling.

Aug 2021 — Present

Graduate Research Associate — Computer Vision Lab

Ohio State University · Columbus, OH

Lead the lab's uncertainty-aware multimodal modeling portfolio under Prof. Jim Davis.
Designed imagery-aware contrastive metrics for **multimodal machine translation** (WMT 2024), showing that state-of-the-art models depend on visual evidence rather than treating images as regularizers.
Developed binomial per-class **reject-option training** for ImageNet, remote sensing, and long-tailed datasets (ISVC 2022 Best Paper; MVA 2025 extension), improving selective accuracy of vision transformers by **+0.4%** and coverage by **+1.3%**.
Integrated these methods into open-source toolkits and analyst-facing evaluation pipelines.

Aug 2023 — Present

Graduate Teaching Associate — Machine Learning & NLP

Ohio State University · Columbus, OH

Support **80+ students** per offering in machine learning, computer vision, and natural language processing courses.
Run recitations, office hours, and targeted study plans, and maintain auto-graded labs (including introductory LLM labs) with an emphasis on calibration, safety, and responsible deployment.

Summers 2022–2024

Graduate Research Intern — Air Force Research Laboratory (U.S. CUI)

Dayton, OH

Summer 2024: Adapted and trained **JEPA and MAE transformers** in a distributed Slurm/Singularity setup for multimodal EO/SAR representation learning, achieving superior low-data performance over supervised methods.
Summer 2023: Developed **Reject Option Beam Search** to improve machine translation quality at large beam widths.
Summer 2022: Pioneered an end-to-end training algorithm for Naturally Constrained Reject Option Classification.

Summers 2020–2021

Undergraduate Research Intern — Air Force Research Laboratory (U.S. CUI)

Dayton, OH

Summer 2021: Devised an **ensemble distillation** method to improve model performance on ambiguous instances.
Summer 2020: Constructed a semi-automated system for **temporal satellite imagery collection** (ICCV 2021 workshop), later released as the Construction-Site-Satellite-Imagery dataset.

2020 — 2021

Undergraduate Research Associate — Computer Vision Lab

Ohio State University · Columbus, OH

Engineered semi-automatic labeling workflows for remote sensing change detection, creating Python tooling that bootstrapped datasets for uncertainty-aware modeling.

Summer 2019

Summer Research Intern — Sii Canada / Concordia University

Montreal, QC

Built anomaly detection dashboards that translated large-scale behavioral telemetry into prioritized experiments, highlighting early lessons on uncertainty estimation.

2018 — 2019

Undergraduate Teaching Associate — Discrete Structures & Algorithms

Ohio State University · Columbus, OH

Mentored discrete structures and algorithms cohorts through recitations, office hours, and targeted study plans emphasizing analytical rigor.

Core Research

Selective prediction LLM self-assessment Confidence calibration Calibration metrics RL fine-tuning GRPO Multimodal learning

Models & Architectures

Vision transformers VLMs EO/SAR

Tools & Infrastructure

Python PyTorch HuggingFace vLLM Unsloth DeepSpeed Slurm Singularity Linux HPC Git

Domains

Defense & remote sensing Analyst-facing tooling Multimodal MT

Service

CVPR 2022–23 ICCV 2023 ECCV 2022 HackOHI/O

Machine Vision and Applications · 2025

Naturally Constrained Reject Option Classification

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup

visionreject optioncalibrationImageNetremote sensing

Invited journal extension of ISVC 2022 Best Paper. Per-class binomial thresholds scale to ImageNet, remote sensing, and long-tailed splits with stronger selective accuracy.

Paper ↗ Code ↗

Artifacts: learning-idk

WMT 2024 · 2024

Assessing the Role of Imagery in Multimodal Machine Translation

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup, G. Erdmann

multimodal MTvision-language modelsevaluationWMT

Contrastive evaluation shows SOTA multimodal MT models leverage pixels beyond a regularization effect.

Paper ↗ Code ↗ Data ↗

Artifacts: calibration

ISVC 2022 · 2022

Best Paper

Learning When to Say "I Don't Know"

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup

visionreject optionselective accuracyImageNet

Binomial modeling of per-class reject thresholds that boost selective accuracy while keeping abstentions calibrated. Extended in MVA 2025 journal version.

Paper ↗ arXiv ↗ Code ↗

Artifacts: learning-idk

ICCV 2021 Workshop on LUAI · 2021

A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions

N. Kashani Motlagh, A. Radhakrishnan, J. Davis, R. Ilin

remote sensingdata collectionlabeling pipelineschange detection

Semi-automated scraping plus OpenStreetMap cues to assemble temporal satellite datasets that feed downstream change-detection.

Paper ↗ Code ↗

Artifacts: Construction-Site-Satellite-Imagery

Browse all publications →

Evaluation harness for multimodal MT, selective LLM routing, and visual-text calibration experiments.

Run imagery-aware contrastive...Log structured reports...

Supports: Assessing the Role of Imagery in Multimodal Machine Translation

Semi-automatic satellite data ingestion plus labeling UI for monitoring changing regions.

Generate OpenStreetMap-guided scrape...Label construction phases...

Supports: A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions

PyTorch toolkit for per-class reject-option training with binomial threshold search, dashboards, and CLI.

Tune per-class thresholds...Export coverage/accuracy curves...

Supports: Learning When to Say "I Don't Know"

View all artifacts →

I am a PhD candidate at Ohio State, defending in 2026, advised by Jim Davis. My research is about making machine learning systems that know when to stop guessing. The question first came up in 2021, when Jim and I were staring at t-SNE plots of image classifiers. The models were making confused, unreliable predictions in regions where class clusters overlapped — but scattered among the noise were pockets of clean, well-separated examples where the model was consistently right. That pattern stuck with me. I wanted to know whether we could learn which parts of the decision space are actually trustworthy, and build systems that act accordingly.

The simplest way I can explain my work is this: I teach AI to stop making things up. Everyone who has used ChatGPT has seen it confidently produce something false. In a casual conversation that is annoying. In production — medical imaging, defense systems, content moderation — a model that guesses wrong with high confidence is worse than one that gives no answer at all. The core problem is that most ML systems are trained to always produce output, with no mechanism to say “I am not sure about this.” My research gives them that mechanism.

The technical framing is selective prediction and abstention. I started with reject-option classification in vision — learning per-class thresholds that let image classifiers back out of ambiguous regions instead of forcing a guess (ISVC 2022, MVA 2025). That line of work showed the value of calibrated abstention, so I carried the idea into multimodal machine translation for WMT 2024, asking whether MT models actually use visual evidence or just treat images as a regularizer. The natural next step was language models themselves, where the problem gets much harder and much more consequential.

In image classification, a dirty region is relatively static — class boundaries overlap and that is that. In language, what counts as dirty depends on context, phrasing, and the specific knowledge required. A question that is unanswerable given one prompt can become straightforward with a small amount of additional reasoning or retrieval. My current research trains LLM agents to make typed abstention decisions — not just a single “I don’t know” reflex, but a diagnosis of why the model is uncertain. The model learns to distinguish between cases where more reasoning would help, cases where external knowledge is needed, and cases that are genuinely unanswerable. I am exploring reinforcement learning approaches, including GRPO, to train these routing policies so that the deferral decision itself is learned rather than hand-coded.

This work is building toward my dissertation on typed deferral and abstention for uncertainty-aware question answering, with a defense expected in 2026. The through line from vision to translation to LLMs is that the same principle — teach the model where its own decision space is trustworthy, and route everything else — scales across modalities and architectures when you give the abstention mechanism enough structure.

On a given day I write Python and PyTorch, run experiments on HPC clusters, and build the tooling that holds research together. I have recently been spending a lot of time on agentic AI workflows, which turn out to be a natural fit for the routing and abstention problems I already think about. I am looking for research scientist or applied ML engineering roles after graduation. I am a U.S. citizen and comfortable working with CUI and DoD requirements.

I train AI systems to know when they don't know

Recent work

Experience

Technical Analyst II — DCS Corp (sponsored by Air Force Research Laboratory)

Graduate Research Associate — Computer Vision Lab

Graduate Teaching Associate — Machine Learning & NLP

Graduate Research Intern — Air Force Research Laboratory (U.S. CUI)

Undergraduate Research Intern — Air Force Research Laboratory (U.S. CUI)

Undergraduate Research Associate — Computer Vision Lab

Summer Research Intern — Sii Canada / Concordia University

Undergraduate Teaching Associate — Discrete Structures & Algorithms

Skills

Core Research

Models & Architectures

Tools & Infrastructure

Domains

Service

Publications

Naturally Constrained Reject Option Classification

Assessing the Role of Imagery in Multimodal Machine Translation

Learning When to Say "I Don't Know"

A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions

Open Source

calibration

Construction-Site-Satellite-Imagery

learning-idk

About