WMT 2024 / 2024

Assessing the Role of Imagery in Multimodal Machine Translation

N. Kashani Motlagh, J. Davis, T. Anderson, J. Gwinnup, G. Erdmann

We design imagery-sensitive contrastive metrics for multimodal machine translation and apply them to systems evaluated at WMT 2024. The experiments compare translations under matched and mismatched visual context, giving a direct test of whether image inputs influence model behavior beyond training-time regularization. We release the evaluation harness and curated counterfactual splits used for the analysis.