Large Reasoning Models Leaderboard

Evaluation of Open R1 models across a diverse range of benchmarks from LightEval. All scores are reported as accuracy.

Aggregation

How to aggregate results for each model

Checkbox Group

Select columns to display

1
open-r1_R1-Distill-Qwen-Math-7B-Merges_v10.00-step-000003244_v11.00-step-000002908_v12.00-step-000006016_ties_densities-0.2-0.2-0.2_lambda-1.0
2025-02-12T14-56-55.504
0.0008
0.3169
0.142
0.6414
0.1954
0.7175
0.6095
0.0667
0.0667
0.0406
0.0406
0.0943
0.2778
0.7175
0.5281