Large Reasoning Models Leaderboard

Evaluation of Open R1 models across a diverse range of benchmarks from LightEval. All scores are reported as accuracy.

Aggregation

How to aggregate results for each model

Checkbox Group

Select columns to display

1
open-r1_R1-Distill-Qwen-Math-7B-Merges_v10.00-step-000003244_v11.00-step-000002908_v12.00-step-000006016_ties_densities-0.2-0.2-0.2_lambda-1.0
2025-06-05T15-21-44.031
0.0104
0.0115
0.0104
0.2753
0.2753
0.7175
0.7175
0.0008
0.3169
0.142
0.1954
0.6095
0.0406
0.0406
0.5281