Large Reasoning Models Leaderboard
Evaluation of Open R1 models across a diverse range of benchmarks from LightEval. All scores are reported as accuracy.
1 | open-r1_R1-Distill-Qwen-Math-7B-Merges_v10.00-step-000003244_v11.00-step-000002908_v12.00-step-000006016_ties_densities-0.2-0.2-0.2_lambda-1.0 | 2025-06-05T15-21-44.031 | 0.0104 | 0.0115 | 0.0104 | 0.2753 | 0.2753 | 0.7175 | 0.7175 | 0.0008 | 0.3169 | 0.142 | 0.1954 | 0.6095 | 0.0406 | 0.0406 | 0.5281 |