Benchmarks
Tasks, baselines, evaluations, and expert review sets for testing theory reasoning.
A shared index of papers, tools, benchmarks, datasets, workshops, and reference links for AI-for-theory work.
Tasks, baselines, evaluations, and expert review sets for testing theory reasoning.
Examples of AI systems proposing conjectures, mechanisms, calculations, or checks.
Formalization projects, structured paper corpora, proof libraries, and scientific datasets.
Concept maps, ontologies, surveys, dependency graphs, and reusable theory summaries.
Talks, working sessions, workshops, and community meetings.
Tools, papers, datasets, and research infrastructure worth tracking.