Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks
under review in Neural Information Processing Systems, 2025
The research identifies a critical oversight in existing techniques, which predominantly focus on comparing responses while neglecting valuable latent signals embedded within prompt inputs, and which only focus on preference disparities at the intra-sample level, while neglecting to account for the inter-sample level preference differentials that exist among preference data. To leverage these previously neglected indicators, it proposes a novel Multi-level Aware Preference Learning (MAPL) framework, capable of enhancing multi-instruction capabilities.
Recommended citation: Sun Ruopei, Cai Jianfeng. (2025). "Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks." arXiv preprint arXiv: 2505.12845, 2025. https://arxiv.org/abs/2505.12845