trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

Stars

4353

Forks

464

Language

Python

Last Updated

May 26, 2024

Similar Repos

Repo	Language	Stars	Description	Updated At
rewardmodeling	Python	4	Train reward models for reinforcement learning from human feedback (RLHF).	Aug 28, 2023
alpaca-rlhf	Python	5	Finetuning alpaca with RLHF (Reinforcement Learning with Human Feedback)	Apr 25, 2023
instructGOOSE	Jupyter Notebook	87	Implementation of Reinforcement Learning from Human Feedback (RLHF)	Mar 29, 2023
safe-rlhf	Python	49	Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback	May 16, 2023
safe-rlhf	Python	2	Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback	Oct 22, 2023
awesome-RLHF-language-models	None	26	Curated list of resources for Reinforcement Learning from Human Feedback and Language Models	Apr 24, 2023
PaLM-rlhf-pytorch	Python	74	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Dec 12, 2022
PaLM-rlhf-pytorch	Python	2	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	May 21, 2023
PaLM-rlhf-pytorch	None	2	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Nov 30, 2023
hh-rlhf	None	774	Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human …	Apr 25, 2023
PaLM-rlhf-jax	Python	6	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Apr 21, 2023
trl	Python	3167	Train transformer language models with reinforcement learning.	Apr 24, 2023
awesome-RLHF	None	972	A curated list of reinforcement learning with human feedback resources (continually updated)	Apr 24, 2023
Alpaca-LoRA-RLHF-PyTorch	Python	7	A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation …	Apr 23, 2023
ChatGLM-LoRA-RLHF-PyTorch	Python	15	A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation …	Apr 24, 2023
Vicuna-LoRA-RLHF-PyTorch	Python	9	A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation …	Apr 24, 2023
my-alpaca	Jupyter Notebook	27	Try original alpaca. The multi-turn version is at [multi-turn-alpaca](https://github.com/l294265421/multi-turn-alpaca) and the version further trained with …	Apr 25, 2023
PARL	Python	2716	A high-performance distributed training framework for Reinforcement Learning	Sep 01, 2022
distributed_reinforcement_learning	Python	51	implementation of distributed reinforcement learning with distributed tensorflow	Jul 15, 2022
multi-agent-reinforcement-learning-for-emergent-communication	Python	2	Learning Grounded Language via Split Screen Communication Learning via Deep Multi-Agent Reinforcement Learning	Sep 17, 2021
Stable-Alignment	Python	201	Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper …	Jul 16, 2023
osim-rl	Python	23	Reinforcement learning environments with musculoskeletal models	Nov 22, 2021
NIPS-2017-Learning-to-Run	Python	12	Reinforcement learning environments with musculoskeletal models	May 15, 2021
osim-rl	Python	848	Reinforcement learning environments with musculoskeletal models	Apr 21, 2023
deep-scheduler	Python	11	Learning to schedule distributed resources with deep reinforcement learning.	Mar 24, 2023
minChatGPT	Python	112	A minimum example of aligning language models with RLHF similar to ChatGPT	Apr 09, 2023
AdaPlanner	HTML	62	AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback	Jan 24, 2024
bioimitation-gym	Python	18	An OpenAI-Gym Package for Training and Testing Reinforcement Learning algorithms with OpenSim Models	Mar 18, 2023
ddrl	Jupyter Notebook	5	Distributed Deep Reinforcement Learning framework	May 13, 2023
neurips2018_rl_challenge	Python	2	Reinforcement learning for human walking motion with prosthetic leg	Sep 26, 2019
SELFormer	Python	12	SELFormer: Molecular Representation Learning via SELFIES Language Models	May 01, 2023
SELFormer_back	Python	4	SELFormer: Molecular Representation Learning via SELFIES Language Models	Apr 11, 2023
DialogRPT	None	3	EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"	May 27, 2022
scaRLa	Scala	4	A distributed reinforcement learning (RL) framework built with Akka	Jan 28, 2023
AlignLLMHumanSurvey	None	525	Aligning Large Language Models with Human: A Survey	Jan 16, 2024
SRL-NLC	None	4	Safe Reinforcement Learning with Natural Language Constraints	Feb 17, 2022
causal_rl	Python	2	Causal models for reinforcement learning	Feb 24, 2023
T2I-HumanFeedback	None	12	Implementations of Baseline Methods for Aligning Text2Img Diffusion Models with Human FeedBack	Apr 24, 2023
Trayne	Go	2	Trayne is a distributed machine learning platform for training models at scale	Jan 17, 2023
rl_markets	C++	266	Market Making via Reinforcement Learning	May 10, 2023
human_marl	Python	6	Cooperative Multi Agent Reinforcement Learning with Human in the Loop	Apr 24, 2023
ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO	Jupyter Notebook	30	A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback …	Mar 13, 2023
ngsim_env	Jupyter Notebook	133	Learning human driver models from NGSIM data with imitation learning.	Aug 22, 2022
summarize-from-feedback	Python	398	Code for "Learning to summarize from human feedback"	Aug 12, 2022
RRHF	None	4	RRHF: Aligning Language Models with Human Preferences without tears	Apr 11, 2023
distributed_rl	Python	67	Pytorch implementation of distributed deep reinforcement learning	Aug 19, 2022
sheeprl	Python	15	Distributed Reinforcement Learning accelerated by Lightning Fabric	May 19, 2023
haiku-scalable-example	Python	55	Scalable distributed reinforcement learning agents on kubernetes	Sep 29, 2022
human_drone_SC	Python	5	[ICDE 2022] Human-Drone Collaborative Spatial Crowdsourcing by Memory-Augmented Distributed Multi-Agent Deep Reinforcement Learning	May 16, 2022
sagemaker-horovod-distributed-training	Jupyter Notebook	34	Distributed training with SageMaker's script mode using Horovod distributed deep learning framework	Oct 30, 2021