Publications

Bias Fitting to Mitigate Length Bias of Reward Model in RLHF

in Association for Computational Linguistics, 2026

To accurately model the intricate nature of length bias and facilitate more effective bias mitigation, it proposes FiMi-RM (Bias Fitting to Mitigate Length Bias of Reward Model in RLHF), a framework that autonomously learns and corrects underlying bias patterns.

Download here

Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling

in International Conference on Learning Representations, 2026

It introduces a Response-conditioned Bradley-Terry (Rc-BT) model that enhances the model’s capability in length bias mitigating and length instruction following, through training on the augmented dataset. Furthermore, it proposes the Rc-RM and Rc-DPO algorithm to leverage the Rc-BT model for reward modeling and direct policy optimization (DPO) of LLMs.

Download here

CodeContests-O: Powering LLMs via Feedback-Driven Iterative Test Case Generation

in Association for Computational Linguistics, 2026

The rise of reasoning models necessitates large-scale verifiable data, for which programming tasks serve as an ideal source. To address this, we propose a Feedback-Driven Iterative Framework for comprehensive test case construction and release CodeContests-O.

Download here

Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks

under review in Neural Information Processing Systems, 2025

The research identifies a critical oversight in existing techniques, which predominantly focus on comparing responses while neglecting valuable latent signals embedded within prompt inputs, and which only focus on preference disparities at the intra-sample level, while neglecting to account for the inter-sample level preference differentials that exist among preference data. To leverage these previously neglected indicators, it proposes a novel Multi-level Aware Preference Learning (MAPL) framework, capable of enhancing multi-instruction capabilities.

Download here

Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering

in Neural Information Processing Systems, 2025

It is the first to systematically investigate the effectiveness and underlying mechanisms of activation engineering for mitigating hallucinations in VideoLLMs. And it proposes a temporal-aware activation engineering framework for VideoLLMs, which adaptively identifies and manipulates hallucination-sensitive modules based on the temporal variation characteristic, substantially mitigating hallucinations without additional LLM fine-tuning.

Download here

Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification

in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023

This paper proposes a Heterogeneous Network based on Contrastive Learning (HCLNet). HCLNet aims to learn high-level representation from unlabeled PolSAR data for few-shot classification according to multi-features.

Download here

Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach

in Arxiv, 2023

This paper proposes the Flowmind2digital method and hdFlowmind dataset to address the convertion of hand-drawn flowchart/mindmap.

Download here

Cai-Jianfeng

Publications

Bias Fitting to Mitigate Length Bias of Reward Model in RLHF

Disentangling Length Bias In Preference Learning Via Response-Conditioned Modeling

CodeContests-O: Powering LLMs via Feedback-Driven Iterative Test Case Generation

Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks

Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering

Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification

Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach