No description
Find a file
Jan De Landtsheer dd43b67af9 ADR-0003: PXE first-boot, frame hardware, storage server as the site's brain; Scene 004
ADR-0003 amendments from Jan's hardware reality: Halos arrive depot-assembled
in frames of 7 boards with embedded 8-port switches (two when dual-NIC, one
otherwise) uplinked to ToR, a frame microcontroller on all seven reset lines
(temp controller/resetter), 8 frames per dielectric immersion basin. Switches
frame and ToR alike are cheap dumb L2 — no service trusted to switch CPUs.

New section 7, first boot: MOS served over PXE on the gossip lane (native
VLAN on node ports); MAC addresses recensed at HQ into the signed founding
papers — no add-node operation at the site, ever; founding payload enumerated
(network plan, guest list, site keypair, software shelf); first sited takes
the lease from nobody. RAs come from the storage server, one per VLAN, long
prefix lifetimes. Node keypair provenance explicitly left to the MOS team
(both options recorded, added to the review gate). Microcontroller enrolls
like any device; reset/temp commands are signed sited-to-uC RPC.

Scene 004 'A site is born': Lisbon commissioned zero-touch — boards' first
words are 'may I have an operating system', addresses are arithmetic, the
frame janitor presses reset so nobody wades into a basin, dead board fails
to clean absence, first postcard turns a new square green in Cairo. Keypair
question kept honestly visible per scenes-as-review-tools.
2026-06-11 22:46:05 +02:00
docs ADR-0003: PXE first-boot, frame hardware, storage server as the site's brain; Scene 004 2026-06-11 22:46:05 +02:00
README.md Add README: granite — name, philosophy, reading order 2026-06-11 21:07:52 +02:00

granite

Architecture for a distributed, heterogeneous LLM serving platform: hundreds of AMD Strix Halo APU nodes (volume tier) and a handful of big-GPU nodes (heavy tier), grouped into WAN-distributed autonomous sites, running on MOS with a thin Rust federation layer on top.

granite — because the design philosophy is boring, layered, and rebuildable from bedrock: every index is a cache of scannable truth; every component degrades in speed, never in correctness; the only consensus in the system is between people.

Reading order

Where What
docs/prd/0001 The platform PRD — requirements, phased plan, ADR ledger with statuses
docs/adr/ Architecture Decision Records — the "what we decided and why"
docs/scenes/ Narrative companions — what the system does, told in plain words (start here if you're new)

Status

Foundation phase (2026-06): ADR-0001 accepted (system architecture, MOS-native, no K8s), ADR-0002 committed (reconciliation & rollout), ADR-0004 in review (registry & protocols). No code yet — by design: plan before build, every phase ships standalone.