Stars
158
Forks
20
Language
Jupyter Notebook
Last Updated
Dec 22, 2023
Similar Repos
Repo | Language | Stars | Description | Updated At |
---|---|---|---|---|
Python | 4 | Train reward models for reinforcement learning from human feedback (RLHF). | Aug 28, 2023 | |
Python | 5 | Finetuning alpaca with RLHF (Reinforcement Learning with Human Feedback) | Apr 25, 2023 | |
Python | 49 | Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback | May 16, 2023 | |
Python | 2 | Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback | Oct 22, 2023 | |
Python | 3061 | A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF) | Apr 25, 2023 | |
Python | 74 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Dec 12, 2022 | |
Python | 2 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | May 21, 2023 | |
None | 2 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Nov 30, 2023 | |
Python | 6 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Apr 21, 2023 | |
None | 774 | Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human … | Apr 25, 2023 | |
None | 26 | Curated list of resources for Reinforcement Learning from Human Feedback and Language Models | Apr 24, 2023 | |
Python | 7 | A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation … | Apr 23, 2023 | |
Python | 15 | A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation … | Apr 24, 2023 | |
Python | 9 | A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation … | Apr 24, 2023 | |
None | 972 | A curated list of reinforcement learning with human feedback resources (continually updated) | Apr 24, 2023 | |
Python | 398 | Code for "Learning to summarize from human feedback" | Aug 12, 2022 | |
Jupyter Notebook | 27 | Try original alpaca. The multi-turn version is at [multi-turn-alpaca](https://github.com/l294265421/multi-turn-alpaca) and the version further trained with … | Apr 25, 2023 | |
Python | 2341 | Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning | Aug 03, 2022 | |
Python | 3 | Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning | Dec 13, 2018 | |
Python | 3 | Tensorflow implementation of Human-Level Control through Deep Reinforcement Learning | May 23, 2020 | |
Python | 45 | Library of Environments, Human Actor UIs and Agent implementation for Human In the Loop Learning … | May 01, 2023 | |
Python | 2 | Reinforcement Learning Implementation | Sep 07, 2022 | |
Jupyter Notebook | 11 | annotated tutorial of the huggingface TRL repo for reinforcement learning from human feedback connecting equations … | Mar 28, 2023 | |
Python | 6 | Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences" | Jul 29, 2022 | |
Python | 230 | Reproduction of OpenAI and DeepMind's "Deep Reinforcement Learning from Human Preferences" | Apr 06, 2023 | |
Python | 3 | Reinforcement learning algorithm implementation | Mar 16, 2023 | |
Scala | 3 | Reinforcement learning Circuit Breaker implementation | Dec 15, 2021 | |
Python | 5 | Implementation code when learning deep reinforcement learning | Dec 01, 2023 | |
HTML | 13 | Presentation on Human-Level Control Through Deep Reinforcement Learning | Feb 13, 2020 | |
Python | 2 | Reinforcement learning for human walking motion with prosthetic leg | Sep 26, 2019 | |
Jupyter Notebook | 30 | A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback … | Mar 13, 2023 | |
Python | 2 | Reward Modeling from Human Preferences and Advantage Actor-Critic Reinforcement Learning: A Reproducibility Study | Jul 28, 2021 | |
Jupyter Notebook | 225 | Reinforcement Learning examples implementation and explanation | Aug 08, 2022 | |
Python | 2 | Pytorch implementation of reinforcement learning algorithms | Oct 01, 2021 | |
Python | 2 | Implementation for Reinforcement Learning: An Introduction | Apr 05, 2021 | |
Python | 2 | Implementation of upside down Reinforcement Learning | Jan 14, 2020 | |
Python | 6 | reinforcement learning: an introduction python implementation | Feb 10, 2020 | |
Python | 18 | 📖 Paper: Human-level control through deep reinforcement learning 🕹️ | Jul 26, 2022 | |
C++ | 6 | Verifiably Safe Deep Reinforcement Learning for Robotic Manipulationin Human Environments | Apr 04, 2023 | |
Python | 6 | Cooperative Multi Agent Reinforcement Learning with Human in the Loop | Apr 24, 2023 | |
Python | 76 | Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning (CVPR2020) | Apr 19, 2023 | |
Python | 8 | Addon: Gamification, feedback, and reinforcement | Mar 10, 2022 | |
Python | 13 | Implementation of the TAMER algorithm from "Interactively Shaping Agents via Human Reinforcement" (Knox, Stone - … | Apr 20, 2023 | |
Jupyter Notebook | 2 | Open-source Human Feedback Library | Apr 12, 2023 | |
Python | 31 | A Simulation Framework for Methods that Learn from Human Feedback | May 24, 2023 | |
Rust | 50 | Flexible, reusable reinforcement learning (Q learning) implementation in Rust | Aug 11, 2022 | |
Python | 4 | Implementation of Reinforcement Learning in Fall 2018 | May 11, 2020 | |
None | 5 | Python Implementation of Reinforcement Learning: An Introduction | Sep 01, 2021 | |
Python | 1566 | TensorFlow implementation of Deep Reinforcement Learning papers | Sep 19, 2022 | |
Python | 11667 | Python Implementation of Reinforcement Learning: An Introduction | Aug 18, 2022 |