Page Not Found
Page not found. Your pixels are in another canvas.
A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.
Page not found. Your pixels are in another canvas.
This is a page not in th emain menu
Published:
Update:
敬请期待🤪
Published:
Update:
这篇博客主要讲解 RLHF 具体训练的框架 (DeepSpeedChat,OpenRLHF,verl) 的具体细节,包括每个框架的整体架构,架构内的各部分细节 (包括逻辑细节和代码细节)。(建议先阅读我之前关于 RLHF 的博客 The Basic Knowledge of RLHF (Reinforce Learning with Human Feedback))
Published:
Update:
这篇博客主要讲解 PyTorch 训练模型的整个流程的具体细节, 包括如何在前向过程中构建计算图;后向传播过程中如何计算并保存梯度;优化器如何根据梯度更新模型参数。(建议先阅读我之前关于 torch.autograd 的博客 The Basic Knowledge of PyTorch Autograd)
Published:
Update:
这篇博客主要讲解关于 RLHF 的基础知识和训练 LLM 的具体(简易)代码实现.
Published:
Update:
这篇博客主要讲解如何在 VMware Workstation Pro 安装 MacOS 虚拟机。
Published:
这篇博客主要讲解了使用 Mixture of Experts (MoE) 将多个模型进行组合的原理。
Published:
这篇博客主要讲解了使用 TorchScript 将 Python 模型代码转化为其他语言代码(如 C++)的原理和具体实现。
Published:
这篇博客主要讲解了使用自动混合精度(AMP)降低模型内存占用的原理和具体实现。
Published:
这篇博客主要讲解了使用梯度惩罚(gradient penalty)作为正则化项来促进模型学习的数学原理和具体实现。
Published:
Update:
这篇博客主要介绍了 PyTorch 的 autograd 机制及其具体实现方式。
Published:
Update:
这篇博客主要介绍了 LLM 分布式并行的训练方式,并着重讲解了 PyTorch 代码的实现 DDP 的方式。
Published:
torch.backends.cudnn.deterministic: 固定 cuda 的随机种子,使得每次返回的卷积算法都是确定的,即默认算法
Published:
这篇博客主要介绍了电脑硬件中的基础知识(ps;强烈安利 B 站硬件茶谈的硬件科普视频,讲的太好了🙂。虽然他现在恰饭有点多😥)
Published:
这篇博客主要介绍了 Large Language Model 的基础知识,包括常见的 LLM,微调方式等。
Published:
论文题目:Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Published:
这篇博客参考了 通俗理解EM算法,详细推导了 Expectation Maximization (EM) 算法。
Published:
Published:
这篇博客主要介绍了 NLP 任务中的基础知识,包括性能评价指标(metrics),分词算法(tokenization)等。
Published:
这篇博客参考了 What are Diffusion Models?,继续详细讲述了最近大火的 DM 模型的改进的数学原理/推导及编程 (ps:DM 的基础知识详见 The Basic Knowledge of Diffusion Model (DM))。
Published:
本文主要对近期 Meta 发表的三篇关于视觉处理的文章(Emu 系列)进行论文解读(按照它们的发布顺序): 首先是 SOTA 的 text-to-image 生成模型 Emu;接着以它为 baseline,进行 image edit 的研究改进,提出了一个大一统的图像编辑模型 Emu Edit, 这基本上就把图像领域主流的任务都刷了个遍。最后又提出了 Emu Video 模型,利用 Emu 完成了对 text-to-video 生成模型的改进,也获得了 SOTA。 (ps:我猜下一步应该就是 video edit 的研究改进了🙂)
Published:
这篇博客参考了 Generative Modeling by Estimating Gradients of the Data Distribution, 详细讲述了最近大火的 Diffusion Model 的另一个理解/推理角度: Score-based Generative Model 的数学原理及编程。 (ps:建议先看完上述的 Generative Modeling by Estimating Gradients of the Data Distribution 博客,虽然是全英文的,但是写的十分详细,且简单易懂,真的非常良心)
Published:
论文题目:Consistency Models
Published:
Published:
论文题目:Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
Published:
这篇博客参考了 DDPM讲解,详细讲述了最近大火的 DM 模型的数学原理/推导及编程(ps:强烈安利 Lil 的博客,写的太好了🙂)。
Published:
论文题目:Is Context all you Need? Scaling Neural Sign Language Translation to Large Domains of Discourse
Published:
这篇博客主要讲解只有一个固态硬盘(SSD)槽的笔记本电脑在需要扩容的简便操作(无需重装系统)。
Published:
论文题目:SLTUNET: A Simple Unified Model for Sign Language Translation
Published:
论文题目:Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation
Published:
Published:
论文题目:Transcribing Natural Languages for the Deaf via Neural Editing Programs
Published:
Published:
Published:
这篇博客延续 The Basic Knowledge Of Latex,继续扩展关于 latex 的基本用法。
Published:
Published:
Published:
这篇博客主要记录我在使用 latex 的过程中所遇到的问题和解决的方法(注:有些问题可能我自己也不知道原理,但是所有的解决方法都是亲测有效)。
Published:
这篇博客主要记录了 latex 的基本知识和用法,适用于第一次听到 latex 这个写作排版工具的小白。
Published:
论文题目:Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Published:
论文题目:iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
Published:
论文题目:Multipattern Mining Using Pattern-Level Contrastive Learning and Multipattern Activation Map
Short description of portfolio item number 1
Short description of portfolio item number 2 
in Arxiv, 2023
This paper proposes the Flowmind2digital method and hdFlowmind dataset to address the convertion of hand-drawn flowchart/mindmap.
Download here
in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023
This paper proposes a Heterogeneous Network based on Contrastive Learning (HCLNet). HCLNet aims to learn high-level representation from unlabeled PolSAR data for few-shot classification according to multi-features.
Download here
in Neural Information Processing Systems, 2025
It is the first to systematically investigate the effectiveness and underlying mechanisms of activation engineering for mitigating hallucinations in VideoLLMs. And it proposes a temporal-aware activation engineering framework for VideoLLMs, which adaptively identifies and manipulates hallucination-sensitive modules based on the temporal variation characteristic, substantially mitigating hallucinations without additional LLM fine-tuning.
Download here
under review in Neural Information Processing Systems, 2025
To accurately model the intricate nature of length bias and facilitate more effective bias mitigation, it proposes FiMi-RM (Bias Fitting to Mitigate Length Bias of Reward Model in RLHF), a framework that autonomously learns and corrects underlying bias patterns.
Download here
under review in Neural Information Processing Systems, 2025
The research identifies a critical oversight in existing techniques, which predominantly focus on comparing responses while neglecting valuable latent signals embedded within prompt inputs, and which only focus on preference disparities at the intra-sample level, while neglecting to account for the inter-sample level preference differentials that exist among preference data. To leverage these previously neglected indicators, it proposes a novel Multi-level Aware Preference Learning (MAPL) framework, capable of enhancing multi-instruction capabilities.
Download here
under review in Association for Computational Linguistics, 2026
The rise of reasoning models necessitates large-scale verifiable data, for which programming tasks serve as an ideal source. To address this, we propose a Feedback-Driven Iterative Framework for comprehensive test case construction and release CodeContests-O.
Download here
in International Conference on Learning Representations, 2026
It introduces a Response-conditioned Bradley-Terry (Rc-BT) model that enhances the model’s capability in length bias mitigating and length instruction following, through training on the augmented dataset. Furthermore, it proposes the Rc-RM and Rc-DPO algorithm to leverage the Rc-BT model for reward modeling and direct policy optimization (DPO) of LLMs.
Download here
Undergraduate, Artifical Intelligence Turing Class, the School of Artifical Intelligence, Xidian University, 2020
I completed my undergraduate studies in Xidian University from 2020 to 2024.