Large Reasoning Models Leaderboard
Evaluation of Open R1 models across a diverse range of benchmarks from LightEval. All scores are reported as accuracy.
1 | open-r1_R1-Distill-Qwen-Math-7B-Merges_v10.00-step-000003244_v11.00-step-000002908_v12.00-step-000006016_ties_densities-0.2-0.2-0.2_lambda-1.0 | 2025-02-12T14-56-55.504 | 0.0008 | 0.3169 | 0.142 | 0.6414 | 0.1954 | 0.7175 | 0.6095 | 0.0667 | 0.0667 | 0.0406 | 0.0406 | 0.0943 | 0.2778 | 0.7175 | 0.5281 |