[deployer/cockpit] Admin VM manages and updates itself through its own Cockpit (admin service bundle, machine/fleet split) #282
Labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/home#282
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Every Hero machine runs Cockpit as its machine surface: a tester VM's Cockpit manages that tester's services with update buttons, release channels, and installed-build receipts. The admin VM is the same kind of machine (it has its own full Cockpit) but today its most important services cannot be updated from any UI: the Cockpit upgrade map covers the shared engines but not hero_tfgrid_deployer or the my_compute_zos chain daemons, so the control machine of the whole sandbox is the only machine still updated by hand over SSH. The deployer stays out of tester bundles on purpose (fleet controls on a tester machine would mislead testers); the change here is only about the admin machine managing itself. Scope:
Done means: on the admin VM, Cockpit Services lists and successfully upgrades the deployer and a chain daemon with a written receipt, a tester VM's catalog is proven unchanged, and the two surfaces link to each other.
Implementation appendix (everything needed to execute)
1. Machine role flag
[[env]]block tohero_cockpit/crates/hero_cockpit_server/service.toml(and to the web crate's service.toml if the navbar needs it):var = "COCKPIT_MACHINE_ROLE",default = "tester", desc explaining that the valueadminenables the admin machine bundle. Every[[env]]block must carry adefaultor the service panics at startup on the manifest schema.[[env]]block to an already registered service requires re-registering it (lab service hero_cockpit_server --start); a binary swap plus restart is NOT enough, the supervisor injects env from the stored service definition.hero_proc secret set --context cockpit COCKPIT_MACHINE_ROLE admin, then re-register and restart both cockpit services. Tester VMs get nothing; the default applies and behavior is unchanged.hero_cockpit_server/src/main.rsat startup into state (enum Tester | Admin; anything that is not exactlyadminmeans Tester, fail-open to Tester).2. Catalog and upgrade map (
hero_cockpit_server)src/catalog.rsis a staticCatalogEntryarray with a completeness test (around line 290) pinning each app's binary set. Refactor to a function returning the base entries always, plus admin entries only when the role is Admin. Keep a test pin asserting the Tester catalog is byte-identical to today's, and add a second pin for the Admin catalog.hero_tfgrid_deployer, repohero_os_tfgrid_deployer, binarieshero_tfgrid_deployer_server+hero_tfgrid_deployer_admin. Release assets verified present on latest-integration:hero_tfgrid_deployer_server-linux-musl-x86_64,hero_tfgrid_deployer_admin-linux-musl-x86_64.hero_compute, repohero_compute, downloadable binarymy_compute_zos_server(asset verified:my_compute_zos_server-linux-musl-x86_64). CAUTION: on the admin VM this ONE binary backs THREE registered services:my_compute_zos_server(qa) plusmy_compute_zos_main_serverandmy_compute_zos_testnet_server(wrappers with their own env; the mainnet one sets a separate compute config context and socket path).lab build hero_compute --download --bin my_compute_zos_main_serverwould fail, there is no such asset. Recommended: extend the entry model withrestart_onlyservice names (download the binaries list once, restart binaries plus restart_only). Acceptable v1 fallback: bundle only the qa service and document that the wrapper daemons need a manual restart after an upgrade.hero_embedder_provider(server + admin) is already in the upgrade map insrc/repos.rs; verifyhero_voice_provider(single service, runs on the admin VM) is mapped too, and add catalog entries for both so they render as bundles.src/repos.rsservice_repo(): addhero_tfgrid_deployer_server|hero_tfgrid_deployer_admintohero_os_tfgrid_deployer;my_compute_zos_server(plus wrapper names if restart_only is chosen) tohero_compute.3. Fleet backlink (
hero_cockpit_web)/hero_tfgrid_deployer/admin/(the deployer admin path through this machine's router). With the flag off the templates render byte-identical to today.4. Control page slimming (deployer admin, optional polish)
5. Caveats that will bite
hero_proc service start hero_cockpit_serveronce, until that issue is fixed.6. Done means (verification)
cockpit.upgrade_servicefor the deployer completes, writes a receipt (tag, commit, md5), and the deployer restarts and answers RPC; same for the compute daemon with all three chain daemons running afterwards.cockpit.list_servicesoutput and the catalog are proven unchanged (flag unset) on a live tester before and after the cockpit rollout.Signed-by: mik-tf mik-tf@noreply.invalid
All scope boxes are done and proven live.
Shipped as hero_cockpit b8998d1 on integration, published to latest-integration: COCKPIT_MACHINE_ROLE on both cockpit services (default tester, only the exact value admin enables the profile, anything else falls back to the tester surface), role-aware catalog with the admin bundle (deployer server plus admin, hero_compute with the new restart_only entry field for the one-binary, three-services chain daemons, embedder and voice provider), the repo mappings for the upgrade gate, and the Fleet navbar link to the deployer admin. Two test pins lock the tester catalog to its pre-change set and the admin bundle to the audited admin VM registrations.
Live proof on the admin VM:
Tester proof on the sandboxfull VM: upgraded its cockpit bundle to the same build; the role key is absent from the process env, the catalog lists exactly the 17 tester apps, no admin bundles, no Fleet link. The only catalog change is hero_memory now listing hero_memory_ui, which is the earlier memory bundle fix arriving with the newer build, unrelated to the profile.
Operational notes:
Signed-by: mik-tf mik-tf@noreply.invalid