proc: verify new RPC + multi-domain contract (server only) #152

Open
opened 2026-06-14 16:29:58 +00:00 by mahmoud · 2 comments
Owner

proc: verify new RPC + multi-domain contract (server only)

Scope: server RPC + multi-domain correctness only. Admin crates (hero_proc_admin, hero_proc_admin_dx, hero_proc_admin_dx_app) are out of scope and untouched. hero_router is not modified (only consumed/listed).

Ground truth derived from a live build + infocheck + contract probe + full test run on the latest development of both hero_proc and hero_skills, against hero_lib development HEAD 925b3df9 (the Cargo.lock was refreshed from a stale a9b14b60).

Phase 0 — ground truth

Static / build

  • cargo check --workspace green
  • cargo clippy --workspace -- -D warnings green (0 warnings)
  • lab infocheck all in-scope crates clean (the only failure is hero_proc_admin_dx_app, out of scope, not a workspace member)
  • lab build --install server + admin + cli built; admin_dx* skipped (disabled)
  • graceful SIGTERM shutdown clean (jobs drained, PID file + socket removed)

Live multi-domain contract probe on hero_proc/rpc.sock

Path Result
GET /api/domains.json 200 — all 4 domains (jobs, logs, secrets, system)
GET /api/{domain}/openrpc.json ×4 200 each
POST /api/{domain}/rpc (ping / sources / secret_list / service_list) 200, real dispatch
GET /health.json, GET /heroservice.json 200 (router health + manifest probes pass)
GET /api/{domain}/events 404 — the one real contract gap

Integration suite (--basic --functional --extended): 278 passed / 4 stable failures (+1 transient flake: uc39_batch_insert once hit logger I/O error: No such file or directory, passed on rerun).

Root causes (verified, not guessed)

  1. SSE — 3 tests + router gapserver bug. web.rs::extra_router serves the job-log SSE handler at top-level /events. The hero_lib serve_domains macro, the SDK, and the schema (oschema/logs/logs.oschema:53 stream_job(job_sid) @sse(...)) all use canonical /api/{domain}/events. The generated logs spec already advertises the correct x-sse extension (endpoint:/events, filter:job_sid); only the served path is wrong. hero_router PR #120/#121 aligned to forward /api/{domain}/events verbatim and derive channels from x-sse, so this is exactly the contract it now consumes.
  2. basic::cleanup::clean_test_data_is_idempotenttest bug. Asserts a single clean_by_tag returns 0, relying on a prior subtest having done the first clean (fails in isolation / on a persistent DB). Server clean_by_tag is genuinely idempotent (live: first/second/third all return 0). Fix the test to be self-contained; do not weaken assertions.
  3. uc39_batch_insert → transient/env flake, not reproducible.

Downstream (LIST only — hero_router not modified)

hero_router development already aligned to the canonical contract (PR #120 8dcffe5 forward verbatim; PR #121 d1dfe88 derive SSE from x-sse, multiplex canonical /events). Once hero_proc serves /api/logs/events, SSE works end-to-end through the router with no router change. No other raw-RPC consumers found broken by the current wire in scope.

Open decisions (resolved)

  1. Single-socket path-based model canonical — yes.
  2. Secrets wire-break accepted, consumers fixed forward — yes (no broken consumers found in scope).
  3. Delete stale top-level /schema/yes (unreferenced; macro uses oschema/).
  4. SSE path scope — canonical /api/{domain}/events + keep /events alias; no sse.json stubs (router dropped them).

Task checklist

  • 1. fix(server): serve job-log SSE at canonical /api/logs/events (+ keep /events alias) — web.rs
  • 2. fix(test): make clean_test_data_is_idempotent self-contained
  • 3. chore: disabled = true in crates/hero_proc_test/service.toml
  • 4. chore: delete stale top-level /schema/
  • 5. chore(deps): carry hero_lib development lock bump a9b14b60925b3df9

Verify: cargo check/clippy, lab build, full integration suite green, live re-probe of /api/logs/events, graceful + force shutdown. PR → development.

## proc: verify new RPC + multi-domain contract (server only) Scope: **server RPC + multi-domain correctness only.** Admin crates (`hero_proc_admin`, `hero_proc_admin_dx`, `hero_proc_admin_dx_app`) are out of scope and untouched. hero_router is not modified (only consumed/listed). Ground truth derived from a **live build + infocheck + contract probe + full test run** on the latest `development` of both `hero_proc` and `hero_skills`, against hero_lib **`development` HEAD `925b3df9`** (the Cargo.lock was refreshed from a stale `a9b14b60`). ### Phase 0 — ground truth **Static / build** - `cargo check --workspace` ✅ green - `cargo clippy --workspace -- -D warnings` ✅ green (0 warnings) - `lab infocheck` ✅ all in-scope crates clean (the only failure is `hero_proc_admin_dx_app`, out of scope, not a workspace member) - `lab build --install` ✅ server + admin + cli built; admin_dx* skipped (disabled) - graceful SIGTERM shutdown ✅ clean (jobs drained, PID file + socket removed) **Live multi-domain contract probe on `hero_proc/rpc.sock`** | Path | Result | |---|---| | `GET /api/domains.json` | ✅ 200 — all 4 domains (jobs, logs, secrets, system) | | `GET /api/{domain}/openrpc.json` ×4 | ✅ 200 each | | `POST /api/{domain}/rpc` (ping / sources / secret_list / service_list) | ✅ 200, real dispatch | | `GET /health.json`, `GET /heroservice.json` | ✅ 200 (router health + manifest probes pass) | | **`GET /api/{domain}/events`** | ❌ **404 — the one real contract gap** | **Integration suite (`--basic --functional --extended`): 278 passed / 4 stable failures** (+1 transient flake: `uc39_batch_insert` once hit `logger I/O error: No such file or directory`, passed on rerun). ### Root causes (verified, not guessed) 1. **SSE — 3 tests + router gap** → **server bug.** `web.rs::extra_router` serves the job-log SSE handler at top-level `/events`. The hero_lib `serve_domains` macro, the SDK, and the schema (`oschema/logs/logs.oschema:53` `stream_job(job_sid) @sse(...)`) all use canonical **`/api/{domain}/events`**. The generated logs spec already advertises the correct `x-sse` extension (`endpoint:/events, filter:job_sid`); only the served path is wrong. hero_router PR #120/#121 aligned to forward `/api/{domain}/events` verbatim and derive channels from `x-sse`, so this is exactly the contract it now consumes. 2. **`basic::cleanup::clean_test_data_is_idempotent`** → **test bug.** Asserts a single `clean_by_tag` returns 0, relying on a prior subtest having done the first clean (fails in isolation / on a persistent DB). Server `clean_by_tag` is genuinely idempotent (live: first/second/third all return 0). Fix the test to be self-contained; do not weaken assertions. 3. **`uc39_batch_insert`** → transient/env flake, not reproducible. ### Downstream (LIST only — hero_router not modified) hero_router `development` already aligned to the canonical contract (PR #120 `8dcffe5` forward verbatim; PR #121 `d1dfe88` derive SSE from `x-sse`, multiplex canonical `/events`). Once hero_proc serves `/api/logs/events`, SSE works end-to-end through the router with no router change. No other raw-RPC consumers found broken by the current wire in scope. ### Open decisions (resolved) 1. Single-socket path-based model canonical — **yes**. 2. Secrets wire-break accepted, consumers fixed forward — **yes** (no broken consumers found in scope). 3. Delete stale top-level `/schema/` — **yes** (unreferenced; macro uses `oschema/`). 4. SSE path scope — **canonical `/api/{domain}/events` + keep `/events` alias**; no `sse.json` stubs (router dropped them). ### Task checklist - [x] 1. fix(server): serve job-log SSE at canonical `/api/logs/events` (+ keep `/events` alias) — `web.rs` - [x] 2. fix(test): make `clean_test_data_is_idempotent` self-contained - [x] 3. chore: `disabled = true` in `crates/hero_proc_test/service.toml` - [x] 4. chore: delete stale top-level `/schema/` - [x] 5. chore(deps): carry hero_lib `development` lock bump `a9b14b60` → `925b3df9` Verify: `cargo check`/`clippy`, `lab build`, full integration suite green, live re-probe of `/api/logs/events`, graceful + force shutdown. PR → `development`.
mahmoud self-assigned this 2026-06-14 16:30:17 +00:00
mahmoud added this to the ACTIVE project 2026-06-14 16:30:21 +00:00
mahmoud added this to the now milestone 2026-06-14 16:30:24 +00:00
Author
Owner

All tasks done and verified — PR #153development.

# Task Commit Status
1 fix(server): serve job-log SSE at canonical /api/logs/events (+ /events alias) 7f7a11d
2 fix(test): clean_test_data_is_idempotent self-contained d9f569a
3 chore: disabled = true in hero_proc_test/service.toml 4a36c76
4 chore: delete stale top-level /schema/ 84465cf
5 chore(deps): hero_lib lock a9b14b60 → 925b3df9 7100287

Extra finding (folded into task 2, commit d9f569a): clean_test_data_removes_everything was also failing — root cause traced live, not guessed. The test deleted scheduled actions via schedule_delete (which removes the action without its logs) before clean_by_tag, orphaning every scheduled action's log subtree so its entries stayed queryable. Proved the server delete is correct by isolating it: a direct logs.delete(src) on a leftover src dropped count 233 → 0 and held. Fixed test-side with schedule_disable + drain-to-terminal + flush settle; no assertions weakened. (Same class as #126 / #141.)

Verification: cargo check · clippy --all-targets -D warnings (0 warnings) · lab build --install · full suite --basic --functional --extended 282 passed / 0 failed, run twice (cleanup test was previously flaky) · graceful SIGTERM · live re-probe GET /api/logs/events?job_sid=… now served (was 404).

All tasks done and verified — PR #153 → `development`. | # | Task | Commit | Status | |---|------|--------|--------| | 1 | fix(server): serve job-log SSE at canonical `/api/logs/events` (+ `/events` alias) | `7f7a11d` | ✅ | | 2 | fix(test): `clean_test_data_is_idempotent` self-contained | `d9f569a` | ✅ | | 3 | chore: `disabled = true` in `hero_proc_test/service.toml` | `4a36c76` | ✅ | | 4 | chore: delete stale top-level `/schema/` | `84465cf` | ✅ | | 5 | chore(deps): hero_lib lock `a9b14b60 → 925b3df9` | `7100287` | ✅ | **Extra finding (folded into task 2, commit `d9f569a`):** `clean_test_data_removes_everything` was *also* failing — root cause traced live, not guessed. The test deleted scheduled actions via `schedule_delete` (which removes the action **without** its logs) *before* `clean_by_tag`, orphaning every scheduled action's log subtree so its entries stayed queryable. Proved the **server delete is correct** by isolating it: a direct `logs.delete(src)` on a leftover src dropped `count 233 → 0` and held. Fixed test-side with `schedule_disable` + drain-to-terminal + flush settle; no assertions weakened. (Same class as #126 / #141.) **Verification:** `cargo check` ✅ · `clippy --all-targets -D warnings` ✅ (0 warnings) · `lab build --install` ✅ · full suite `--basic --functional --extended` **282 passed / 0 failed, run twice** (cleanup test was previously flaky) · graceful SIGTERM ✅ · live re-probe `GET /api/logs/events?job_sid=…` now served (was 404).
Author
Owner

Update: rebased onto development after 953e752 (remove tracked Cargo.lock + gitignore). The earlier lock-bump commit (task 5) was dropped — with Cargo.lock now untracked, hero_lib resolves fresh from branch = "development" (currently 60867649, floated up from 925b3df9). Rebuilt + re-ran the full suite on the new base: 282 passed / 0 failed, clippy clean, SSE path live (GET /api/logs/events → handler 404 on bogus sid, i.e. route mounted).

Rebased commit SHAs on PR #153:

  • e6d1e11 fix(server): SSE at /api/logs/events
  • da124ac fix(test): cleanup tests self-contained + race-free
  • fd2ede4 chore: disable hero_proc_test service
  • a6a55df chore: remove stale top-level /schema/

PR #153 is mergeable into development.

Update: rebased onto `development` after `953e752` (*remove tracked `Cargo.lock` + gitignore*). The earlier lock-bump commit (task 5) was **dropped** — with `Cargo.lock` now untracked, hero_lib resolves fresh from `branch = "development"` (currently `60867649`, floated up from `925b3df9`). Rebuilt + re-ran the full suite on the new base: **282 passed / 0 failed**, clippy clean, SSE path live (`GET /api/logs/events` → handler 404 on bogus sid, i.e. route mounted). Rebased commit SHAs on PR #153: - `e6d1e11` fix(server): SSE at `/api/logs/events` - `da124ac` fix(test): cleanup tests self-contained + race-free - `fd2ede4` chore: disable hero_proc_test service - `a6a55df` chore: remove stale top-level `/schema/` PR #153 is mergeable into `development`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_proc#152
No description provided.