Bias Fitting to Mitigate Length Bias of Reward Model in RLHF
in Association for Computational Linguistics, 2026
To accurately model the intricate nature of length bias and facilitate more effective bias mitigation, it proposes FiMi-RM (Bias Fitting to Mitigate Length Bias of Reward Model in RLHF), a framework that autonomously learns and corrects underlying bias patterns.
Download here
