Vicuna-LoRA-RLHF-PyTorch

A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with …

Stars

196

Forks

18

Language

Python

Last Updated

May 14, 2024

Similar Repos

Repo	Language	Stars	Description	Updated At
Alpaca-LoRA-RLHF-PyTorch	Python	7	A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation …	Apr 23, 2023
ChatGLM-LoRA-RLHF-PyTorch	Python	15	A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation …	Apr 24, 2023
PaLM-rlhf-pytorch	Python	74	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Dec 12, 2022
PaLM-rlhf-pytorch	Python	2	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	May 21, 2023
PaLM-rlhf-pytorch	None	2	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Nov 30, 2023
alpaca-rlhf	Python	5	Finetuning alpaca with RLHF (Reinforcement Learning with Human Feedback)	Apr 25, 2023
instructGOOSE	Jupyter Notebook	87	Implementation of Reinforcement Learning from Human Feedback (RLHF)	Mar 29, 2023
PaLM-rlhf-jax	Python	6	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Apr 21, 2023
rewardmodeling	Python	4	Train reward models for reinforcement learning from human feedback (RLHF).	Aug 28, 2023
baichuan_sft_lora	Python	51	baichuan LLM surpervised finetune by lora	Jan 15, 2024
safe-rlhf	Python	49	Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback	May 16, 2023
safe-rlhf	Python	2	Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback	Oct 22, 2023
trlx	Python	3061	A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)	Apr 25, 2023
finetune_stable_diffusion	Jupyter Notebook	14	finetune stable diffusion with Dreambooth、LoRA、ControlNet	Jun 30, 2023
my-alpaca	Jupyter Notebook	27	Try original alpaca. The multi-turn version is at [multi-turn-alpaca](https://github.com/l294265421/multi-turn-alpaca) and the version further trained with …	Apr 25, 2023
awesome-llm-human-preference-datasets	None	27	A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.	May 09, 2023
Bloom-Lora	Python	18	Finetune Bloom big language model with Lora method	Apr 19, 2023
Alpacaman	Python	2	Alpaca-lora Chatbot with Infinite Memory by Finetune	Apr 23, 2023
LLaMA-MOSS-RLHF-LoRA	Python	5	用RLHF可选LoRA对LLaMA和MOSS进行训练\|Training LLaMA or MOSS with RLHF [LoRA]	Jun 12, 2023
awesome-RLHF	None	972	A curated list of reinforcement learning with human feedback resources (continually updated)	Apr 24, 2023
hh-rlhf	None	774	Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human …	Apr 25, 2023
horace	Python	23	LLM chatbot server with ChatGPT plugins	Apr 11, 2023
LLM-RLHF-Tuning	Python	285	LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)	Jan 19, 2024
awesome-RLHF-language-models	None	26	Curated list of resources for Reinforcement Learning from Human Feedback and Language Models	Apr 24, 2023
minChatGPT	Python	112	A minimum example of aligning language models with RLHF similar to ChatGPT	Apr 09, 2023
trubrics-sdk	Python	52	Enhance your ML workflows with human feedback	May 24, 2023
ChatLLM-Web	JavaScript	53	🗣️ Chat with LLM like Vicuna totally in your browser with WebGPU, safely, privately, and …	May 11, 2023
langchain-streamlit-demo	Python	5	langchain-streamlit demo with streaming llm, memory, and langsmith feedback	Oct 16, 2023
ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO	Jupyter Notebook	30	A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback …	Mar 13, 2023
chat-llama-discord-bot	Python	36	A Discord Bot for chatting with LLaMA, Vicuna, Alpaca, or any other LLM supported by …	Apr 09, 2023
neurips2018_rl_challenge	Python	2	Reinforcement learning for human walking motion with prosthetic leg	Sep 26, 2019
human_marl	Python	6	Cooperative Multi Agent Reinforcement Learning with Human in the Loop	Apr 24, 2023
chat-llama-discord-bot	Python	6	A Discord Bot for chatting with LLaMA, Vicuna, Alpaca, or any other LLM supported by …	May 08, 2023
chatgpt_telegram_bot	Python	7	ChatGPT re-created with GPT-3.5 LLM as Telegram Bot. Light-weight fork.	Apr 14, 2023
T2I-HumanFeedback	None	12	Implementations of Baseline Methods for Aligning Text2Img Diffusion Models with Human FeedBack	Apr 24, 2023
DialogRPT	None	3	EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"	May 27, 2022
Prioritized-Human-in-the-loop-End-to-end-Autonomous-Driving	Python	22	(TNNLS) Prioritized Experience-Based Reinforcement Learning with Human Guidance for Autonomous Driving	Apr 24, 2023
Prioritized-Human-in-the-loop-End-to-end-Autonomous-Driving	Python	2	(TNNLS) Prioritized Experience-Based Reinforcement Learning with Human Guidance for Autonomous Driving	Aug 05, 2023
llm-primer	None	8	A primer on large language models (LLM) as of Jan 2023, with bonus ChatGPT topic	Mar 22, 2023
CURVS	Python	2	Connects nodes (tanks, cars, anything basically) together to be controlled by a centralised environment with …	Dec 21, 2021
awesome-oss-llm-ift-rlhf	None	14	Collection of open source implementations of LLMs with IFT and RLHF that are striving to …	Apr 09, 2023
miti	Go	144	miti is a musical instrument textual interface. Basically, its MIDI, but with human-readable text. :musical_note:	Jul 29, 2022
Driving-IRL-NGSIM	Python	102	(T-ITS) Driving Behavior Modeling using Naturalistic Human Driving Data with Inverse Reinforcement Learning	May 04, 2023
PyMAF	Python	2	[ICCV21, Oral] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop	Dec 07, 2022
PyMAF	Python	2	[ICCV21, Oral] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop	Feb 28, 2023
EditorGPT	TypeScript	2	A code editor integrated with ChatGPT to provide realtime analysis and feedback. Made using T3	Apr 23, 2023
LoRa-IoT-Project-with-Arduino-ESP8266-control-Relay	C++	3	In this Lora IoT project tutorial, I have shown how to make the LoRa Arduino …	Oct 04, 2022
PyMAF	Python	374	[ICCV 2021, Oral] PyMAF: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback …	Aug 17, 2022
Anki_FlashCard_Generator	Python	38	Automatically generate Anki Flashcards from your PDF files with LLM (ChatGPT in this case) to …	Jun 17, 2023
appimagelint	Python	27	Check AppImages for compatibility, best practices etc. Powerful functionality combined with simple usage and human-friendly …	Sep 17, 2022