
Cai-Jianfeng
First-year graduate student, School of Information Science and Technology, USTC
- Location
- Github
- Stackoverflow
- Google Scholar
- PubMed
- ORCID
The Basic Knowledge of RLHF Training Pipeline
64 minute read
Published:
Update:
这篇博客主要讲解 RLHF 具体训练的框架 (DeepSpeedChat,OpenRLHF,verl) 的具体细节,包括每个框架的整体架构,架构内的各部分细节 (包括逻辑细节和代码细节)。(建议先阅读我之前关于 RLHF 的博客 The Basic Knowledge of RLHF (Reinforce Learning with Human Feedback))
The Basic Knowledge of Torch Train Pipeline
14 minute read
Published:
Update:
这篇博客主要讲解 PyTorch 训练模型的整个流程的具体细节, 包括如何在前向过程中构建计算图;后向传播过程中如何计算并保存梯度;优化器如何根据梯度更新模型参数。(建议先阅读我之前关于 torch.autograd 的博客 The Basic Knowledge of PyTorch Autograd)
The Basic Knowledge of RLHF (Reinforce Learning with Human Feedback)
37 minute read
Published:
Update:
这篇博客主要讲解关于 RLHF 的基础知识和训练 LLM 的具体(简易)代码实现.
VMware Workstation Pro 安装 MacOS 虚拟机
17 minute read
Published:
Update:
这篇博客主要讲解如何在 VMware Workstation Pro 安装 MacOS 虚拟机。