Applied Scientist / ML Engineer · available August 2026

Nick Kashani Motlagh

I work on selective prediction, calibration, and abstention policies for ML systems.

PhD candidate at Ohio State’s Computer Vision Lab, graduating August 2026. My research studies when fixed model pipelines should predict, defer to retrieval, or abstain, with work spanning per-class reject thresholds, multimodal MT evaluation, and lightweight controllers for adaptive QA.

Now Graduating August 2026 · applied scientist and ML engineer roles · LLM eval, abstention, safety.

Notebook

Latest writing

Browse posts
  1. Why LLMs Need Reject Options

    Selective prediction gives LLM systems a way to trade coverage for reliability instead of treating confidence as one scalar.

Built

Projects and open-source.

Training code, evaluation harnesses, calibration utilities, and datasets tied to the publications.

Repo

learning-idk

Companion code for ISVC 2022 / MVA 2025: per-class reject-option classification with binomial threshold search.

  • Python
  • PyTorch
  • selective prediction
  • Tune per-class thresholds on ImageNet, iNaturalist, or custom datasets.
  • Export coverage/accuracy curves and selective-accuracy plots.
  • Apply learned reject thresholds to existing PyTorch classifier outputs.

Repo

calibration

PyTorch calibration utilities for histogram binning, global temperature scaling, and class-wise temperature scaling.

  • Python
  • PyTorch
  • calibration
  • Run global and class-wise temperature scaling on classifier logits.
  • Produce reliability diagrams and ECE / class-wise ECE reports.
  • Provides calibration utilities used by the imagery-aware contrastive MT evaluation code.

Repo

construction-site-satellite-imagery-collection

Companion code for OpenStreetMap-guided temporal satellite imagery collection and annotation.

  • Python
  • OpenStreetMap
  • remote sensing
  • Generate OpenStreetMap-guided download manifests for temporal imagery.
  • Label construction phases with the bundled annotation app.
  • Export train/val/test splits for change-detection baselines.
All repos
Selected work

Publications.

Each paper links out to the PDF, code, or data where available.

ISVC 2022 / 2022

Learning When to Say "I Don't Know"

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup

Per-class reject thresholds estimated from validation statistics, improving selective accuracy and coverage over global thresholding.

  • Best Paper at ISVC 2022; later extended in the MVA 2025 journal version.
  • Reported +0.4% selective-accuracy and +1.3% coverage gains on ImageNet over global thresholding.

Why it matters

Best Paper

  • vision
  • reject option
  • selective accuracy
  • ImageNet

Runnable artifacts

Machine Vision and Applications / 2025

Naturally Constrained Reject Option Classification

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup

Journal extension of ISVC 2022 Best Paper, evaluating per-class binomial reject thresholds on ImageNet and remote-sensing datasets.

  • Invited journal extension of the ISVC Best Paper with additional datasets, analysis, and calibration experiments.
  • Per-class binomial thresholds outperform global thresholding on ImageNet and remote-sensing splits.

Why it matters

+1.3% coverage

  • vision
  • reject option
  • calibration
  • ImageNet
  • remote sensing

Runnable artifacts

WMT 2024 / 2024

Assessing the Role of Imagery in Multimodal Machine Translation

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup, G. Erdmann

Contrastive evaluation of WMT 2024 multimodal MT systems shows measurable dependence on paired visual context.

  • Introduced imagery-aware contrastive probes for testing whether translations change under mismatched visual context.
  • Benchmarked nine multimodal MT systems across evaluation splits with high image-sensitivity variance.

Why it matters

+7% image-grounding score

  • multimodal MT
  • vision-language models
  • evaluation
  • WMT

Runnable artifacts

ICCV 2021 Workshop on LUAI / 2021

A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions

N. Kashani Motlagh, A. Radhakrishnan, J. Davis, R. Ilin

OpenStreetMap-guided imagery collection and labeling tools for building temporal satellite datasets for dynamic-region analysis.

  • Combined imagery download, polygon filtering, temporal organization, and annotation UI in a reproducible Python pipeline.
  • Reduced manual setup for construction-site monitoring datasets and downstream change-detection experiments.

Why it matters

Combined imagery download, polygon filtering, temporal organization, and annotation UI in a reproducible Python pipeline.

  • remote sensing
  • data collection
  • labeling pipelines
  • change detection
All publications
Experience

Roles.

Research, teaching, internships, and applied evaluation work.

Role

Technical Analyst II — DCS Corp (sponsored by Air Force Research Laboratory)

Dayton, OH / May 2025 — Present

  • Train and evaluate abstention-augmented LLM policies optimized for downstream utility, with 8× OOD utility improvement in current experiments relative to competing routing baselines.
  • Build evaluation harnesses and calibration dashboards for comparing LLM policy variants across coverage, utility, and out-of-distribution behavior.

Role

Graduate Research Associate — Computer Vision Lab

Ohio State University · Columbus, OH / Aug 2021 — Present

  • Conduct research on uncertainty-aware vision, multimodal, and language systems under Prof. Jim Davis.
  • Designed imagery-aware contrastive metrics for multimodal machine translation (WMT 2024), measuring whether translations depend on paired visual context.

Role

Graduate Teaching Associate — Machine Learning & NLP

Ohio State University · Columbus, OH / Aug 2023 — Present

  • Support 80+ students per offering in machine learning, computer vision, and natural language processing courses.
  • Run recitations and office hours, grade technical assignments, and maintain auto-graded labs covering ML fundamentals, computer vision, NLP, and introductory LLM workflows.

Role

Graduate Research Intern — Air Force Research Laboratory (U.S. CUI)

Dayton, OH / Summers 2022–2024

  • Summer 2024: Adapted and trained JEPA and MAE transformers in a distributed Slurm/Singularity setup for multimodal EO/SAR representation learning, improving low-data performance relative to supervised baselines.
  • Summer 2023: Developed Reject Option Beam Search to improve machine translation quality at large beam widths.
About

How the research has developed.

From selective prediction in vision to adaptive question answering with LLMs, with stops along the way in multimodal translation and remote sensing.

My research is on selective prediction, calibration, and abstention: decision rules that determine when a model should return a prediction, request more evidence, or withhold an answer. I am interested in settings where average accuracy is not enough because the system also needs a calibrated policy for coverage, routing, and failure modes.

I am a PhD candidate at Ohio State, graduating August 2026, advised by Jim Davis. I am first author on my public research papers. The technical thread starts with per-class reject thresholds for image classifiers (ISVC 2022 Best Paper, MVA 2025 journal extension), extends to contrastive evaluation for multimodal machine translation (WMT 2024), and now focuses on adaptive LLM question answering, where uncertainty depends on both the question and the capabilities of the model–retriever–corpus stack.

My current submission studies the retrieve-versus-abstain boundary after direct answering has already been ruled out. The result suggests that answer confidence and stack-relative recoverability should be estimated as separate routing signals.

Implementation work includes Python/PyTorch training code, evaluation harnesses, calibration dashboards, and distributed experiment runs on Slurm/Singularity clusters. Recent work has focused on retrieval-augmented QA, abstention-augmented LLM policies, and evaluation pipelines that expose coverage, recoverability, and out-of-distribution behavior.

I am looking for Applied Scientist and ML Engineer roles starting August 2026, after my defense. I am strongest on teams working on LLM evaluation, calibration, retrieval-augmented systems, selective prediction, or safety/reliability infrastructure. Based in Columbus, OH, open to relocation and remote. U.S. citizen with five summers of AFRL cleared experience; cleared and federal roles are welcome.

  1. 2021–24

    Selective prediction for vision

    Class-conditional reject thresholds for image classifiers, estimated from validation statistics and evaluated with coverage/selective-accuracy tradeoffs.

    ISVC 2022 Best Paper; MVA 2025 journal extension.

  2. 2024

    Multimodal machine translation

    Contrastive evaluation for measuring whether multimodal MT systems use paired image evidence rather than benefiting only from image-conditioned training.

    WMT 2024.

  3. 2025–26

    Adaptive QA routing

    Three-way routing for QA pipelines: direct answer, retrieve-then-answer, or abstain, with recoverability estimated separately from answer confidence.

    Current submission under review.

Work with me

Available August 2026 for Applied Scientist and ML Engineer roles.

Best fit: teams working on LLM evaluation, calibration, selective prediction, retrieval-augmented QA, or reliability infrastructure. I am comfortable with experiment design, PyTorch training code, distributed cluster runs, evaluation harnesses, and metrics/reporting layers. U.S. citizen with five summers of cleared AFRL experience; cleared and federal roles welcome.

At a glance

  • PhD, The Ohio State University — graduating August 2026
  • Applied Scientist / ML Engineer · selective prediction, calibration, LLM evaluation
  • First author on 4 public papers · ISVC 2022 Best Paper
  • 8× OOD utility gain in abstention-augmented LLM experiments (DCS Corp / AFRL)
  • Python · PyTorch · HuggingFace · Slurm/Singularity · RAG evaluation
  • U.S. citizen · five summers cleared work · Columbus OH, open to relocation / remote

Recent roles

  1. Technical Analyst II — DCS Corp (sponsored by Air Force Research Laboratory)

    Dayton, OH / May 2025 — Present
  2. Graduate Research Associate — Computer Vision Lab

    Ohio State University · Columbus, OH / Aug 2021 — Present
  3. Graduate Teaching Associate — Machine Learning & NLP

    Ohio State University · Columbus, OH / Aug 2023 — Present
  4. Graduate Research Intern — Air Force Research Laboratory (U.S. CUI)

    Dayton, OH / Summers 2022–2024