ADR-0003 amendments from Jan's hardware reality: Halos arrive depot-assembled in frames of 7 boards with embedded 8-port switches (two when dual-NIC, one otherwise) uplinked to ToR, a frame microcontroller on all seven reset lines (temp controller/resetter), 8 frames per dielectric immersion basin. Switches frame and ToR alike are cheap dumb L2 — no service trusted to switch CPUs. New section 7, first boot: MOS served over PXE on the gossip lane (native VLAN on node ports); MAC addresses recensed at HQ into the signed founding papers — no add-node operation at the site, ever; founding payload enumerated (network plan, guest list, site keypair, software shelf); first sited takes the lease from nobody. RAs come from the storage server, one per VLAN, long prefix lifetimes. Node keypair provenance explicitly left to the MOS team (both options recorded, added to the review gate). Microcontroller enrolls like any device; reset/temp commands are signed sited-to-uC RPC. Scene 004 'A site is born': Lisbon commissioned zero-touch — boards' first words are 'may I have an operating system', addresses are arithmetic, the frame janitor presses reset so nobody wades into a basin, dead board fails to clean absence, first postcard turns a new square green in Cairo. Keypair question kept honestly visible per scenes-as-review-tools. |
||
|---|---|---|
| docs | ||
| README.md | ||
granite
Architecture for a distributed, heterogeneous LLM serving platform: hundreds of AMD Strix Halo APU nodes (volume tier) and a handful of big-GPU nodes (heavy tier), grouped into WAN-distributed autonomous sites, running on MOS with a thin Rust federation layer on top.
granite — because the design philosophy is boring, layered, and rebuildable from bedrock: every index is a cache of scannable truth; every component degrades in speed, never in correctness; the only consensus in the system is between people.
Reading order
| Where | What |
|---|---|
docs/prd/0001 |
The platform PRD — requirements, phased plan, ADR ledger with statuses |
docs/adr/ |
Architecture Decision Records — the "what we decided and why" |
docs/scenes/ |
Narrative companions — what the system does, told in plain words (start here if you're new) |
Status
Foundation phase (2026-06): ADR-0001 accepted (system architecture, MOS-native, no K8s), ADR-0002 committed (reconciliation & rollout), ADR-0004 in review (registry & protocols). No code yet — by design: plan before build, every phase ships standalone.