Alpaca-LoRA-RLHF-PyTorch

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with …

Stars

53

Forks

6

Language

Python

Last Updated

Apr 23, 2024

Similar Repos

Repo	Language	Stars	Description	Updated At
ChatGLM-LoRA-RLHF-PyTorch	Python	15	A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation …	Apr 24, 2023
Vicuna-LoRA-RLHF-PyTorch	Python	9	A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation …	Apr 24, 2023
alpaca-rlhf	Python	5	Finetuning alpaca with RLHF (Reinforcement Learning with Human Feedback)	Apr 25, 2023
PaLM-rlhf-pytorch	Python	74	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Dec 12, 2022
PaLM-rlhf-pytorch	Python	2	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	May 21, 2023
PaLM-rlhf-pytorch	None	2	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Nov 30, 2023
instructGOOSE	Jupyter Notebook	87	Implementation of Reinforcement Learning from Human Feedback (RLHF)	Mar 29, 2023
my-alpaca	Jupyter Notebook	27	Try original alpaca. The multi-turn version is at [multi-turn-alpaca](https://github.com/l294265421/multi-turn-alpaca) and the version further trained with …	Apr 25, 2023
PaLM-rlhf-jax	Python	6	Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically …	Apr 21, 2023
Alpacaman	Python	2	Alpaca-lora Chatbot with Infinite Memory by Finetune	Apr 23, 2023
rewardmodeling	Python	4	Train reward models for reinforcement learning from human feedback (RLHF).	Aug 28, 2023
baichuan_sft_lora	Python	51	baichuan LLM surpervised finetune by lora	Jan 15, 2024
safe-rlhf	Python	49	Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback	May 16, 2023
safe-rlhf	Python	2	Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback	Oct 22, 2023
trlx	Python	3061	A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)	Apr 25, 2023
finetune_stable_diffusion	Jupyter Notebook	14	finetune stable diffusion with Dreambooth、LoRA、ControlNet	Jun 30, 2023
alpaca-lora	Jupyter Notebook	5	Finetuning InstructLLaMA on consumer hardware (copy from https://github.com/tloen/alpaca-lora)	Apr 17, 2023
check-with-gpt	Python	6	An automatic validator of the Alpaca dataset for finetuning alpaca-lora or any other LLM accepting …	Mar 31, 2023
awesome-llm-human-preference-datasets	None	27	A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.	May 09, 2023
Bloom-Lora	Python	18	Finetune Bloom big language model with Lora method	Apr 19, 2023
LLaMA-MOSS-RLHF-LoRA	Python	5	用RLHF可选LoRA对LLaMA和MOSS进行训练\|Training LLaMA or MOSS with RLHF [LoRA]	Jun 12, 2023
alpaca_llama_index	Python	13	Example of Alpaca-LoRA with llama index.	Apr 08, 2023
awesome-RLHF	None	972	A curated list of reinforcement learning with human feedback resources (continually updated)	Apr 24, 2023
hh-rlhf	None	774	Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human …	Apr 25, 2023
horace	Python	23	LLM chatbot server with ChatGPT plugins	Apr 11, 2023
LLM-RLHF-Tuning	Python	285	LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)	Jan 19, 2024
awesome-RLHF-language-models	None	26	Curated list of resources for Reinforcement Learning from Human Feedback and Language Models	Apr 24, 2023
minChatGPT	Python	112	A minimum example of aligning language models with RLHF similar to ChatGPT	Apr 09, 2023
trubrics-sdk	Python	52	Enhance your ML workflows with human feedback	May 24, 2023
alpaca-weight	Python	32	Train llama with lora on one 4090 and merge weight of lora to work as …	Mar 30, 2023
AlpacaChat	C	104	A Swift library that runs Alpaca-LoRA prediction locally to implement ChatGPT like app on Apple …	Apr 21, 2023
langchain-streamlit-demo	Python	5	langchain-streamlit demo with streaming llm, memory, and langsmith feedback	Oct 16, 2023
cerebras-lora-alpaca	Jupyter Notebook	53	LoRA weights for Cerebras-GPT-2.7b finetuned on Alpaca dataset with shorter prompt	Apr 20, 2023
AlpacaTrainingData-EduTuned	None	2	Stanford Alpaca LLM Training Data, modified with prompts and training data from educational sources	May 25, 2023
ChatGPT-Decoded-GPT2-FAQ-Bot-RLHF-PPO	Jupyter Notebook	30	A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback …	Mar 13, 2023
neurips2018_rl_challenge	Python	2	Reinforcement learning for human walking motion with prosthetic leg	Sep 26, 2019
human_marl	Python	6	Cooperative Multi Agent Reinforcement Learning with Human in the Loop	Apr 24, 2023
WhiTTsper-The-Lora	Jupyter Notebook	7	Demo combining Whisper for speech recognition and Google TTS for speech synthesis to interact with …	Apr 01, 2023
chatgpt_telegram_bot	Python	7	ChatGPT re-created with GPT-3.5 LLM as Telegram Bot. Light-weight fork.	Apr 14, 2023
T2I-HumanFeedback	None	12	Implementations of Baseline Methods for Aligning Text2Img Diffusion Models with Human FeedBack	Apr 24, 2023
DialogRPT	None	3	EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"	May 27, 2022
chat-llama-discord-bot	Python	36	A Discord Bot for chatting with LLaMA, Vicuna, Alpaca, or any other LLM supported by …	Apr 09, 2023
llamallamallama	TypeScript	2	Llamallamallama is a chat solution that allows users to chat with "Llama" fine tuned with …	Apr 15, 2024
Prioritized-Human-in-the-loop-End-to-end-Autonomous-Driving	Python	22	(TNNLS) Prioritized Experience-Based Reinforcement Learning with Human Guidance for Autonomous Driving	Apr 24, 2023
Prioritized-Human-in-the-loop-End-to-end-Autonomous-Driving	Python	2	(TNNLS) Prioritized Experience-Based Reinforcement Learning with Human Guidance for Autonomous Driving	Aug 05, 2023
ChatGPTTradingBot	JavaScript	4	From the video Build A ChatGPT Trading Bot With Real Time News (Alpaca Markets API …	Apr 21, 2023
Colab_for_Alpaca_Lora	Jupyter Notebook	12	Here is a Google Colab Notebook for fine-tuning Alpaca Lora (within 3 hours with a …	Apr 16, 2023
llm-primer	None	8	A primer on large language models (LLM) as of Jan 2023, with bonus ChatGPT topic	Mar 22, 2023
CURVS	Python	2	Connects nodes (tanks, cars, anything basically) together to be controlled by a centralised environment with …	Dec 21, 2021
awesome-oss-llm-ift-rlhf	None	14	Collection of open source implementations of LLMs with IFT and RLHF that are striving to …	Apr 09, 2023