Train LLMs to reason like o1. 5,000 graph coloring tasks verified by Z3 — no hallucinations, no label noise. Pure algorithmic signal for System 2 fine-tuning.
Every row is tagged with one of three difficulty tiers — Easy, Medium, or Hard — derived from graph density, chromatic number, and search depth required.
{
"task_type": "task_a_coloring",
"graph_type": "bipartite",
"difficulty": "medium",
"strategy": "dsatur",
"nodes": 42,
"edges": 260,
"instruction": "Graph with 42 nodes. Edges: [(12,0),(16,0)...]
Color using ≤2 colors, no adjacent nodes same color.",
"reasoning": {
"strategy": "dsatur",
"preamble": "42 nodes, 260 edges, bipartite → χ(G)=2",
"steps": [
{ "node": 20, "saturation": 0, "assigned_color": 0, "forbidden": [] },
{ "node": 7, "saturation": 1, "assigned_color": 1, "forbidden": [0] },
{ "action": "[backtrack]", "from_node": 47, "reason": "conflict" }
]
},
"solution": { "20": 0, "7": 1, "16": 0, "...": "..." }
}
| Metric | 5k Baseline (Free) | 20k Dataset | 100k Dataset |
|---|---|---|---|
| Total Rows | 5,000 | ~20,000 | ~100,000 |
| Total Tokens | ~5.45M | ~21.8M | ~109M |
| Avg Tokens / Row | 1,089 | 1,089 | 1,089 |
| Median Tokens / Row | 899 | ~900 | ~900 |
| Max Tokens / Row | 4,161 | ~4,200 | ~4,200 |
| p95 Tokens / Row | 2,658 | ~2,700 | ~2,700 |
| Recommended Model Context | 8k+ | 8k+ | 8k+ |
| Task Type | Avg Tokens | Max Tokens | Row Count |
|---|---|---|---|
| task_a_coloring | 1,378 | 4,161 | ~2,088 |
| task_b_validation | 957 | 3,272 | ~1,541 |
| task_c_missing_color | 887 | 3,116 | ~1,099 |
| task_d_chromatic | 432 | 1,267 | ~271 |