Reject or Refine? Separating Retrievable from Unrecoverable Uncertainty in Adaptive QA

Modern adaptive QA systems have to choose among three actions — answer directly, retrieve and then answer, or abstain — but most work studies only one of these boundaries at a time. This paper asks a narrower question: under a fixed model–retriever–corpus stack, is the signal that says “retrieve” the same signal that says “abstain”?

The short answer is no. Over 41,145 eval instances, cheap logprob baselines top out at Recoverability AUC .518–.553 and have Reject Recall = 0 — they never reject. The strongest retriever-side scalar improves AUC but still collapses at its calibrated best-E2E operating point. A small class-weighted question-only controller, in contrast, reaches Recoverability AUC .678 ± .005 with Reject Recall .487 ± .059, and at matched coverage beats cumulative-logprob thresholds on Refine Recall by +.252 [.233, .272].

Operationally, the 3-way routing problem is a two-stage control problem: answer confidence decides whether to leave direct answering, and stack-relative recoverability decides whether retrieval can rescue the question or whether it should be rejected. The experiments identify class weighting as the mechanism that preserves a non-trivial retrieve action when the unrecoverable majority would otherwise drown the minority retrieve class.

Under review at EMNLP 2026. Code release will follow acceptance.