Evaluation harness for multimodal MT, selective LLM routing, and visual-text calibration experiments.
- Run imagery-aware contrastive probes against WMT-style checkpoints.
- Benchmark LLM reject-option heads on held-out OOD prompts.
- Log structured reports (HTML/Markdown) for rapid model comparisons.
- Used to benchmark LLM reject-option heads and multimodal MT models for AFRL and academic deployments.
Supports:
Assessing the Role of Imagery in Multimodal Machine Translation , Selective LLM Training with Reject Options
Semi-automatic satellite data ingestion plus labeling UI for monitoring changing regions.
- Generate OpenStreetMap-guided scrape manifests for temporal imagery.
- Label construction phases with the included lightweight annotation app.
- Export train/val/test splits for change-detection baselines.
- Demonstrates end-to-end dataset design, labeling tooling, and export pipelines for remote sensing change detection.
Supports:
A Framework for Semi-automatic Collection of Temporal Satellite Imagery for Analysis of Dynamic Regions
PyTorch toolkit for per-class reject-option training with binomial threshold search, dashboards, and CLI.
- Tune per-class thresholds on ImageNet, iNat, or custom datasets with one command.
- Export coverage/accuracy curves and selective accuracy plots for reports.
- Integrate abstention policies into existing Torch models via lightweight hooks.
- Production-ready toolkit that underpins selective prediction guardrails across multiple domains (vision, text, and 2-D data).
Supports:
Learning When to Say "I Don't Know" , Naturally Constrained Reject Option Classification