Stars
6
Forks
2
Language
None
Last Updated
Aug 26, 2023
Similar Repos
Repo | Language | Stars | Description | Updated At |
---|---|---|---|---|
Jupyter Notebook | 37 | Mechanistic Interpretability Visualizations using React | May 08, 2023 | |
Jupyter Notebook | 2 | Mechanistic Interpretability Visualizations using React | Dec 11, 2023 | |
Python | 5 | A Mechanistic Interpretability Analysis of Grokking | Jun 13, 2023 | |
None | 2 | A repository for awesome resources in mechanistic interpretability | Mar 08, 2023 | |
Jupyter Notebook | 2 | Mechanistic Interpretability Tutorials, Results and research log as I learn from @neelnanda-io's wonderful Easy-Transformer | Mar 23, 2023 | |
Jupyter Notebook | 3 | A mechanistic interpretability study invvestigating a sequential model trained to play the board game Othello | Oct 25, 2023 | |
Python | 3 | A game of deception and explosions. | Jul 22, 2016 | |
Jupyter Notebook | 2 | Langchain for RLHF | Mar 09, 2023 | |
Julia | 3 | Thyroid hormone mechanistic model | Dec 29, 2022 | |
C | 3 | Deception Technology for Endpoints | Oct 12, 2021 | |
JavaScript | 20 | A game of communication, deception, and media | May 23, 2017 | |
Jupyter Notebook | 2 | Story Teller based on RLHF and GPT | Apr 27, 2023 | |
Python | 2 | RLHF Pipeline for StableLM | Apr 26, 2023 | |
Jupyter Notebook | 4 | Deception and bias-detection code for code LLMs | Aug 23, 2022 | |
Python | 285 | OWASP Honeypot, Automated Deception Framework. | Aug 08, 2022 | |
JavaScript | 337 | DejaVU - Open Source Deception Framework | Jul 17, 2022 | |
None | 2 | Dark Deception - Türkçe Yama (Gayriresmî) | May 02, 2021 | |
Python | 3353 | Model interpretability and understanding for PyTorch | Aug 10, 2022 | |
None | 2 | General Interpretability Package | Feb 27, 2020 | |
Jupyter Notebook | 8 | Attacks Meet Interpretability | May 17, 2022 | |
Jupyter Notebook | 11 | Mechanistic model for the Filecoin Economy | Mar 18, 2023 | |
JavaScript | 3 | A social deception game based on culture and art | Oct 11, 2020 | |
None | 4 | Echelons of Deception and Survival — Competitive Online Multiplayer Game | Mar 19, 2023 | |
None | 2 | Lists of datasets, training, and evals for RLHF and similar | Jul 05, 2023 | |
Jupyter Notebook | 7 | Predict customer churn with text and interpretability. | Oct 24, 2021 | |
Python | 10 | TorchEsegeta: Interpretability and Explainability pipeline for PyTorch | May 31, 2022 | |
Python | 6 | Exploring methods for merging mechanistic and models to forecast epidemics. | May 19, 2022 | |
Python | 3 | gradients based interpretability methods | Jul 06, 2021 | |
JavaScript | 2 | Talk on textual interpretability | Jul 20, 2020 | |
Jupyter Notebook | 2 | Interpretability Hackathon 2.0 entry | Apr 16, 2023 | |
Jupyter Notebook | 2 | Machine Learning Interpretability Resources | Aug 15, 2019 | |
Python | 8 | Reward Model framework for LLM RLHF | May 09, 2023 | |
HTML | 3 | Tower of Deception for BG2:ToB, BG2:EE and EET | Oct 18, 2022 | |
Python | 3 | Proof-of-concept cyber deception utility emulating Samba and LibSSH | Feb 16, 2022 | |
Python | 1152 | Interpretability and explainability of data and machine learning models | Sep 03, 2022 | |
C++ | 5 | Repeatable Analysis Programming for Interpretability, Durability, and Organization | Mar 01, 2023 | |
R | 2 | Climate-based mechanistic model of arbovirus transmission in Ecuador and Kenya. | Dec 01, 2023 | |
HTML | 4 | Developing patient-specific phosphoproteomic models using mechanistic autoencoders | Aug 11, 2022 | |
C# | 8 | 2-4 players fast-paced party game of strategy and deception | Apr 18, 2022 | |
Python | 4 | Fine-tuning LLaMA with PEFT (SFT+RLHF) | May 28, 2023 | |
TypeScript | 8 | Redwood Research's transformer interpretability tools | Apr 28, 2022 | |
Jupyter Notebook | 452 | H2O.ai Machine Learning Interpretability Resources | Aug 14, 2022 | |
Jupyter Notebook | 2 | H2O.ai Machine Learning Interpretability Resources | Feb 08, 2022 | |
Python | 7 | Local interpretability for survival models | Apr 26, 2023 | |
Python | 16 | Interpretability dashboard for reinforcement learners | Nov 25, 2021 | |
Jupyter Notebook | 87 | Implementation of Reinforcement Learning from Human Feedback (RLHF) | Mar 29, 2023 | |
Python | 5 | 用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA] | Jun 12, 2023 | |
None | 3 | Feature Interaction Interpretability via Interaction Detection | Oct 25, 2020 | |
Python | 2 | Increase the accessibility and interpretability of law to the layperson | Oct 11, 2021 | |
HTML | 2 | Using Scattertext to examine publicly available datasets about deception | Dec 14, 2018 |