
hillclimb
Explore curated datasets of mathematical problems and AI model evaluations
Available Datasets
Select a dataset to view problems and evaluation results
Hillclimb Benchmark v1
Curated set of 50 original mathematical problems designed by Hillclimb researchers to evaluate reasoning capabilities. Tested against Claude Sonnet 4.5 with extended thinking (60k max tokens, 200k context window).
Putnam 2024 Grading Rubric
Custom grading rubric and scoring guidelines used for the 2024 William Lowell Putnam Mathematical Competition.
Generic Rubric v1
General-purpose incremental grading rubric for evaluating partial credit on olympiad-style mathematical proofs and solutions.
Generic USAMO Style Rubric
Grading rubric aligned with USAMO (USA Mathematical Olympiad) standards for evaluating proof-based competition problems.
Math Rubrics Collection
Collection of 30 mathematical problems with detailed step-by-step solutions and corresponding grading rubrics for consistent evaluation.
Nomos Corpus Preview
Preview of the off-shelf corpus used to train Nomos-1 with Nous Research Lab. Verified olympiad-style problems from competition archives.