humaneval-langchain

Benchmark results from code generation with LLMs

Stars

16

Forks

1

Language

Jupyter Notebook

Last Updated

Apr 15, 2024

Similar Repos

Repo	Language	Stars	Description	Updated At
CSharpPerformanceBoosters	None	2	Highly Performant C# code with benchmark results	Mar 03, 2023
LLM-RGB	TypeScript	63	LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.	Jan 15, 2024
nodebench	HTML	4	Node.js Benchmark Results	Jan 22, 2022
rubybench.github.io	Haml	4	Ruby benchmark results	Apr 07, 2023
results	Makefile	2	BenchOpt benchmark results	Jul 13, 2022
roboflow-100-benchmark	Jupyter Notebook	154	Code for replicating Roboflow 100 benchmark results and programmatically downloading benchmark datasets	Apr 24, 2023
superpixel-benchmark-nvd3	JavaScript	2	Interactive plots of results from the superpixel benchmark davidstutz/superpixel-benchmark using NVD3.	Dec 29, 2021
benchmark_results	Matlab	2	visual tracker benchmark results	Nov 13, 2018
Horreum	Java	23	Benchmark results repository service	Apr 20, 2023
benchmark_results	None	2	visual tracker benchmark results	Jun 10, 2020
benchmark_results	None	2	visual tracker benchmark results	Jul 18, 2019
lance-benchmark-results	Python	4	History for benchmark results	Jun 01, 2023
docugen	Python	4	Simple documentation generation using OpenAI LLMs	Mar 27, 2023
code-eval	Python	295	Run evaluation on LLMs using human-eval benchmark	Jan 17, 2024
dataline	TypeScript	3	Small project to use LLMs with context for SQL generation	Mar 23, 2024
DataScienceProblems	Python	33	A repository containing the Jupyter notebook code generation benchmark.	May 30, 2022
Tablet	Python	2	The TABLET benchmark for evaluating instruction learning with LLMs for tabular prediction.	Apr 30, 2023
cto-tproc-results	None	2	CTO Team TPROC benchmark results	Feb 12, 2022
netbenches	Shell	62	Network forwarding performance benchmark results	Nov 13, 2022
async-tasks-benchmark	Elixir	3	Check out benchmark results here:	Aug 25, 2019
jsonrpc-benchmark	Shell	2	NEAR JSON RPC benchmark results	Jul 16, 2023
Unconditional-Audio-Generation-Benchmark	Python	6	Unconditional audio generation benchmark	Apr 14, 2023
backup-bench	Shell	59	Quick and dirty backup tool benchmark with reproducible results	Apr 13, 2023
benchviz	Go	4	A tool used for visualizing results from benchmark tests over time.	Oct 28, 2020
specificityplus	Python	9	👩‍💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"	Jun 10, 2023
sombrero	C	5	A next-generation conjugate gradient benchmark from computational particle physics	Jan 07, 2022
pylith_benchmarks	Python	7	Benchmark data and results for PyLith.	Mar 20, 2022
vue-composition-api-benchmark-results	HTML	2	Benchmark results of @vue/composition-api	Mar 24, 2022
bench-show	Haskell	14	Show, plot and compare benchmark results	Jun 26, 2022
lossless-benchmark	HTML	9	Benchmark results for lossless data compressors	Nov 25, 2022
scene_graph_benchmark	Python	263	image scene graph generation benchmark	Aug 08, 2022
repochat	Python	140	Chatbot assistant enabling GitHub repository interaction using LLMs with Retrieval Augmented Generation	Oct 21, 2023
AgentBench	Python	1548	A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)	Jan 19, 2024
train-procgen	Python	143	Code for the paper "Leveraging Procedural Generation to Benchmark Reinforcement Learning"	Jul 31, 2022
zcbor	C	3	Code generation from CDDL descriptions.	May 29, 2022
zcbor	C	41	Code generation from CDDL descriptions.	Jun 30, 2022
benchmark-automation	None	2	Benchmark Automation & Results for DataFusion and Ballista	Feb 07, 2023
hpc_benchmark_analysis	Python	2	Random scripts to interpret benchmark / profiling results	Sep 03, 2022
go-bench-viewer	HTML	45	Easy and intuitive Go Benchmark Results Viewer.	Dec 14, 2022
codefuse-devops-eval	Python	521	Industrial-first evaluation benchmark for LLMs in the DevOps/AIOps domain.	Jan 19, 2024
OpenMEVA	Python	24	Benchmark for evaluating open-ended generation	Oct 15, 2022
LGEB	Python	14	LGEB: Benchmark of Language Generation Evaluation	Oct 31, 2022
scene_graph_benchmark	Python	6	image scene graph generation benchmark @ torch1.12	Jul 02, 2023
glge	JavaScript	54	Code for ACL2021 paper: "GLGE: A New General Language Generation Evaluation Benchmark"	Aug 02, 2022
cvxpygen	Python	52	Code generation with CVXPY	Aug 04, 2022
afl-cov	Python	31	Produce code coverage results with gcov from afl-fuzz test cases	Jul 17, 2022
afl-cov	Python	415	Produce code coverage results with gcov from afl-fuzz test cases	Apr 28, 2023
semantic_features_gpt_3	Jupyter Notebook	6	Code and data from semantic feature generation with GPT-3	Feb 24, 2023
tf-benchmarks	Python	2	Benchmark code	Dec 24, 2018
example_benchmark_2	Python	2	Additional benchmark repo showing how to benchmark multi-agent mcts variants with the configurable scenario generation	Oct 05, 2022