Large Reasoning Models Leaderboard

Evaluation of Open R1 models across a diverse range of benchmarks from LightEval. All scores are reported as accuracy.

Aggregation

How to aggregate results for each model

min max mean


1	open-r1_R1-Distill-Qwen-Math-7B-Merges_v10.00-step-000003244_v11.00-step-000002908_v12.00-step-000006016_ties_densities-0.2-0.2-0.2_lambda-1.0	2025-06-05T15-21-44.031	0.0104	0.0115	0.0104	0.2753	0.2753	0.7175	0.7175	0.0008	0.3169	0.142	0.1954	0.6095	0.0406	0.0406	0.5281