hillclimb

Explore curated datasets of mathematical problems and AI model evaluations

Available Datasets

Select a dataset to view problems and evaluation results

Hillclimb Benchmark v1

Curated set of 50 original mathematical problems designed by Hillclimb researchers to evaluate reasoning capabilities. Tested against Claude Sonnet 4.5 with extended thinking (60k max tokens, 200k context window).

hillclimbclaude-sonnet-4-5-20250514

document

Putnam 2024 Grading Rubric

Custom grading rubric and scoring guidelines used for the 2024 William Lowell Putnam Mathematical Competition.

putnamreference document

document

Generic Rubric v1

General-purpose incremental grading rubric for evaluating partial credit on olympiad-style mathematical proofs and solutions.

evaluationreference document

document

Generic USAMO Style Rubric

Grading rubric aligned with USAMO (USA Mathematical Olympiad) standards for evaluating proof-based competition problems.

usamoreference document

30 pairs

Math Rubrics Collection

Collection of 30 mathematical problems with detailed step-by-step solutions and corresponding grading rubrics for consistent evaluation.

evaluationproblem-solution pairs

13 problems

Nomos Corpus Preview

Preview of the off-shelf corpus used to train Nomos-1 with Nous Research Lab. Verified olympiad-style problems from competition archives.

nous researchnomos-1