Stars
4353
Forks
464
Language
Python
Last Updated
May 26, 2024
Similar Repos
Repo | Language | Stars | Description | Updated At |
---|---|---|---|---|
Python | 4 | Train reward models for reinforcement learning from human feedback (RLHF). | Aug 28, 2023 | |
Python | 5 | Finetuning alpaca with RLHF (Reinforcement Learning with Human Feedback) | Apr 25, 2023 | |
Jupyter Notebook | 87 | Implementation of Reinforcement Learning from Human Feedback (RLHF) | Mar 29, 2023 | |
Python | 49 | Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback | May 16, 2023 | |
Python | 2 | Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback | Oct 22, 2023 | |
None | 26 | Curated list of resources for Reinforcement Learning from Human Feedback and Language Models | Apr 24, 2023 | |
Python | 74 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Dec 12, 2022 | |
Python | 2 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | May 21, 2023 | |
None | 2 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Nov 30, 2023 | |
None | 774 | Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human … | Apr 25, 2023 | |
Python | 6 | Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically … | Apr 21, 2023 | |
Python | 3167 | Train transformer language models with reinforcement learning. | Apr 24, 2023 | |
None | 972 | A curated list of reinforcement learning with human feedback resources (continually updated) | Apr 24, 2023 | |
Python | 7 | A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation … | Apr 23, 2023 | |
Python | 15 | A full pipeline to finetune ChatGLM LLM with LoRA and RLHF on consumer hardware. Implementation … | Apr 24, 2023 | |
Python | 9 | A full pipeline to finetune Vicuna LLM with LoRA and RLHF on consumer hardware. Implementation … | Apr 24, 2023 | |
Jupyter Notebook | 27 | Try original alpaca. The multi-turn version is at [multi-turn-alpaca](https://github.com/l294265421/multi-turn-alpaca) and the version further trained with … | Apr 25, 2023 | |
Python | 2716 | A high-performance distributed training framework for Reinforcement Learning | Sep 01, 2022 | |
Python | 51 | implementation of distributed reinforcement learning with distributed tensorflow | Jul 15, 2022 | |
Python | 2 | Learning Grounded Language via Split Screen Communication Learning via Deep Multi-Agent Reinforcement Learning | Sep 17, 2021 | |
Python | 201 | Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper … | Jul 16, 2023 | |
Python | 23 | Reinforcement learning environments with musculoskeletal models | Nov 22, 2021 | |
Python | 12 | Reinforcement learning environments with musculoskeletal models | May 15, 2021 | |
Python | 848 | Reinforcement learning environments with musculoskeletal models | Apr 21, 2023 | |
Python | 11 | Learning to schedule distributed resources with deep reinforcement learning. | Mar 24, 2023 | |
Python | 112 | A minimum example of aligning language models with RLHF similar to ChatGPT | Apr 09, 2023 | |
HTML | 62 | AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedback | Jan 24, 2024 | |
Python | 18 | An OpenAI-Gym Package for Training and Testing Reinforcement Learning algorithms with OpenSim Models | Mar 18, 2023 | |
Jupyter Notebook | 5 | Distributed Deep Reinforcement Learning framework | May 13, 2023 | |
Python | 2 | Reinforcement learning for human walking motion with prosthetic leg | Sep 26, 2019 | |
Python | 12 | SELFormer: Molecular Representation Learning via SELFIES Language Models | May 01, 2023 | |
Python | 4 | SELFormer: Molecular Representation Learning via SELFIES Language Models | Apr 11, 2023 | |
None | 3 | EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data" | May 27, 2022 | |
Scala | 4 | A distributed reinforcement learning (RL) framework built with Akka | Jan 28, 2023 | |
None | 525 | Aligning Large Language Models with Human: A Survey | Jan 16, 2024 | |
None | 4 | Safe Reinforcement Learning with Natural Language Constraints | Feb 17, 2022 | |
Python | 2 | Causal models for reinforcement learning | Feb 24, 2023 | |
None | 12 | Implementations of Baseline Methods for Aligning Text2Img Diffusion Models with Human FeedBack | Apr 24, 2023 | |
Go | 2 | Trayne is a distributed machine learning platform for training models at scale | Jan 17, 2023 | |
C++ | 266 | Market Making via Reinforcement Learning | May 10, 2023 | |
Python | 6 | Cooperative Multi Agent Reinforcement Learning with Human in the Loop | Apr 24, 2023 | |
Jupyter Notebook | 30 | A Practical Guide to Developing a Reliable FAQ Chatbot with Reinforcement Learning and Human Feedback … | Mar 13, 2023 | |
Jupyter Notebook | 133 | Learning human driver models from NGSIM data with imitation learning. | Aug 22, 2022 | |
Python | 398 | Code for "Learning to summarize from human feedback" | Aug 12, 2022 | |
None | 4 | RRHF: Aligning Language Models with Human Preferences without tears | Apr 11, 2023 | |
Python | 67 | Pytorch implementation of distributed deep reinforcement learning | Aug 19, 2022 | |
Python | 15 | Distributed Reinforcement Learning accelerated by Lightning Fabric | May 19, 2023 | |
Python | 55 | Scalable distributed reinforcement learning agents on kubernetes | Sep 29, 2022 | |
Python | 5 | [ICDE 2022] Human-Drone Collaborative Spatial Crowdsourcing by Memory-Augmented Distributed Multi-Agent Deep Reinforcement Learning | May 16, 2022 | |
Jupyter Notebook | 34 | Distributed training with SageMaker's script mode using Horovod distributed deep learning framework | Oct 30, 2021 |