Stars
53
Forks
6
Language
Python
Last Updated
Apr 23, 2024
Similar Repos
Repo | Language | Stars | Description | Updated At |
---|---|---|---|---|
Python | 15 | A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation … | Apr 24, 2023 | |
Python | 9 | A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation … | Apr 24, 2023 | |
Python | 5 | Finetuning alpaca with RLHF (Reinforcement Learning with Human Feedback) | Apr 25, 2023 | |
Python | 74 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Dec 12, 2022 | |
Python | 2 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | May 21, 2023 | |
None | 2 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Nov 30, 2023 | |
Jupyter Notebook | 87 | Implementation of Reinforcement Learning from Human Feedback (RLHF) | Mar 29, 2023 | |
Jupyter Notebook | 27 | Try original alpaca. The multi-turn version is at [multi-turn-alpaca](https://github.com/l294265421/multi-turn-alpaca) and the version further trained with … | Apr 25, 2023 | |
Python | 6 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Apr 21, 2023 | |
Python | 2 | Alpaca-lora Chatbot with Infinite Memory by Finetune | Apr 23, 2023 | |
Python | 4 | Train reward models for reinforcement learning from human feedback (RLHF). | Aug 28, 2023 | |
Python | 51 | baichuan LLM surpervised finetune by lora | Jan 15, 2024 | |
Python | 49 | Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback | May 16, 2023 | |
Python | 2 | Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback | Oct 22, 2023 | |
Python | 3061 | A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF) | Apr 25, 2023 | |
Jupyter Notebook | 14 | finetune stable diffusion with Dreambooth、LoRA、ControlNet | Jun 30, 2023 | |
Jupyter Notebook | 5 | Finetuning InstructLLaMA on consumer hardware (copy from https://github.com/tloen/alpaca-lora) | Apr 17, 2023 | |
Python | 6 | An automatic validator of the Alpaca dataset for finetuning alpaca-lora or any other LLM accepting … | Mar 31, 2023 | |
None | 27 | A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval. | May 09, 2023 | |
Python | 18 | Finetune Bloom big language model with Lora method | Apr 19, 2023 | |
Python | 5 | 用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA] | Jun 12, 2023 | |
Python | 13 | Example of Alpaca-LoRA with llama index. | Apr 08, 2023 | |
None | 972 | A curated list of reinforcement learning with human feedback resources (continually updated) | Apr 24, 2023 | |
None | 774 | Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human … | Apr 25, 2023 | |
Python | 23 | LLM chatbot server with ChatGPT plugins | Apr 11, 2023 | |
Python | 285 | LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA) | Jan 19, 2024 | |
None | 26 | Curated list of resources for Reinforcement Learning from Human Feedback and Language Models | Apr 24, 2023 | |
Python | 112 | A minimum example of aligning language models with RLHF similar to ChatGPT | Apr 09, 2023 | |
Python | 52 | Enhance your ML workflows with human feedback | May 24, 2023 | |
Python | 32 | Train llama with lora on one 4090 and merge weight of lora to work as … | Mar 30, 2023 | |
C | 104 | A Swift library that runs Alpaca-LoRA prediction locally to implement ChatGPT like app on Apple … | Apr 21, 2023 | |
Python | 5 | langchain-streamlit demo with streaming llm, memory, and langsmith feedback | Oct 16, 2023 | |
Jupyter Notebook | 53 | LoRA weights for Cerebras-GPT-2.7b finetuned on Alpaca dataset with shorter prompt | Apr 20, 2023 | |
None | 2 | Stanford Alpaca LLM Training Data, modified with prompts and training data from educational sources | May 25, 2023 | |
Jupyter Notebook | 30 | A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback … | Mar 13, 2023 | |
Python | 2 | Reinforcement learning for human walking motion with prosthetic leg | Sep 26, 2019 | |
Python | 6 | Cooperative Multi Agent Reinforcement Learning with Human in the Loop | Apr 24, 2023 | |
Jupyter Notebook | 7 | Demo combining Whisper for speech recognition and Google TTS for speech synthesis to interact with … | Apr 01, 2023 | |
Python | 7 | ChatGPT re-created with GPT-3.5 LLM as Telegram Bot. Light-weight fork. | Apr 14, 2023 | |
None | 12 | Implementations of Baseline Methods for Aligning Text2Img Diffusion Models with Human FeedBack | Apr 24, 2023 | |
None | 3 | EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data" | May 27, 2022 | |
Python | 36 | A Discord Bot for chatting with LLaMA, Vicuna, Alpaca, or any other LLM supported by … | Apr 09, 2023 | |
TypeScript | 2 | Llamallamallama is a chat solution that allows users to chat with "Llama" fine tuned with … | Apr 15, 2024 | |
Python | 22 | (TNNLS) Prioritized Experience-Based Reinforcement Learning with Human Guidance for Autonomous Driving | Apr 24, 2023 | |
Python | 2 | (TNNLS) Prioritized Experience-Based Reinforcement Learning with Human Guidance for Autonomous Driving | Aug 05, 2023 | |
JavaScript | 4 | From the video Build A ChatGPT Trading Bot With Real Time News (Alpaca Markets API … | Apr 21, 2023 | |
Jupyter Notebook | 12 | Here is a Google Colab Notebook for fine-tuning Alpaca Lora (within 3 hours with a … | Apr 16, 2023 | |
None | 8 | A primer on large language models (LLM) as of Jan 2023, with bonus ChatGPT topic | Mar 22, 2023 | |
Python | 2 | Connects nodes (tanks, cars, anything basically) together to be controlled by a centralised environment with … | Dec 21, 2021 | |
None | 14 | Collection of open source implementations of LLMs with IFT and RLHF that are striving to … | Apr 09, 2023 |