fix(admin-ui): model dropdown stays at 'Loading models...' on transient RPC failure #153

Open
opened 2026-06-15 08:24:48 +00:00 by rawdaGastan · 1 comment
Member

Problem

When the admin UI's initial loadChatModels() call fails (e.g. transient JSON-RPC blip while the admin process is starting up, or a brief disconnect from the server socket), the chat model autocomplete input stays pinned at the placeholder Loading models... forever. The catch block only logs to console.error — it does not surface the failure to the user, does not retry, and does not update the placeholder.

Other parts of the page that depend on loadChatModels succeeding (sidebar CATALOG > Models count, TTS/STT dropdowns, cost estimator) all silently stay at empty/0 in this state.

Repro:

  1. Open the admin UI during a moment the admin RPC isn't responding (mid-restart, mid-supervisor cycle, etc.).
  2. The page renders, but the Model input stays at "Loading models...", CATALOG > Models says 0, the audio dropdowns stay at "Disabled".
  3. Refreshing the page after the admin RPC is back to normal restores everything — the model dropdown does not self-recover otherwise.

Root cause

In crates/hero_aibroker_admin/templates/fragments/chat_pane.html the loadChatModels() function ends with:

} catch (e) { console.error('Failed to load models:', e); }

No placeholder update, no retry, no user-visible signal. The browser console shows the error, but a normal operator looking at the UI sees a stuck spinner-like state with no indication that anything went wrong.

Proposal

Two-line change to the catch block:

  1. Retry once after a short delay (~2 s). Most transient failures resolve within the admin process's startup window.
  2. Update the placeholder so the user has visible feedback: Failed to load models — retrying… during the retry attempt, then Failed to load models — refresh page if the retry also fails.

Code shape:

async function loadChatModels(_retried) {
    try {
        // ...existing logic unchanged...
    } catch (e) {
        console.error('Failed to load models:', e);
        if (!_retried) {
            chatModelInput.placeholder = 'Failed to load models — retrying…';
            setTimeout(() => loadChatModels(true), 2000);
        } else {
            chatModelInput.placeholder = 'Failed to load models — refresh page';
        }
    }
}

The connection-status badge in the header already polls every 30 s and self-heals, but the model dropdown only fetches once on page load — so it currently never recovers. This change brings the dropdown's resilience in line with the badge's.

Acceptance criteria

  • When loadChatModels()'s first attempt throws, the placeholder updates to Failed to load models — retrying… and a second attempt fires after 2 s.
  • If the second attempt succeeds, the model dropdown populates normally and the placeholder turns into Search model....
  • If the second attempt also fails, the placeholder becomes Failed to load models — refresh page and there are no further automatic retries.
  • No behavioural change on the success path.
  • No other files in the admin crate are modified.
## Problem When the admin UI's initial `loadChatModels()` call fails (e.g. transient JSON-RPC blip while the admin process is starting up, or a brief disconnect from the server socket), the chat model autocomplete input stays pinned at the placeholder **`Loading models...`** forever. The `catch` block only logs to `console.error` — it does not surface the failure to the user, does not retry, and does not update the placeholder. Other parts of the page that depend on `loadChatModels` succeeding (sidebar `CATALOG > Models` count, TTS/STT dropdowns, cost estimator) all silently stay at empty/0 in this state. Repro: 1. Open the admin UI during a moment the admin RPC isn't responding (mid-restart, mid-supervisor cycle, etc.). 2. The page renders, but the `Model` input stays at "Loading models...", `CATALOG > Models` says `0`, the audio dropdowns stay at "Disabled". 3. Refreshing the page after the admin RPC is back to normal restores everything — the model dropdown does not self-recover otherwise. ## Root cause In [`crates/hero_aibroker_admin/templates/fragments/chat_pane.html`](crates/hero_aibroker_admin/templates/fragments/chat_pane.html) the `loadChatModels()` function ends with: ```js } catch (e) { console.error('Failed to load models:', e); } ``` No placeholder update, no retry, no user-visible signal. The browser console shows the error, but a normal operator looking at the UI sees a stuck spinner-like state with no indication that anything went wrong. ## Proposal Two-line change to the catch block: 1. **Retry once after a short delay** (~2 s). Most transient failures resolve within the admin process's startup window. 2. **Update the placeholder** so the user has visible feedback: `Failed to load models — retrying…` during the retry attempt, then `Failed to load models — refresh page` if the retry also fails. Code shape: ```js async function loadChatModels(_retried) { try { // ...existing logic unchanged... } catch (e) { console.error('Failed to load models:', e); if (!_retried) { chatModelInput.placeholder = 'Failed to load models — retrying…'; setTimeout(() => loadChatModels(true), 2000); } else { chatModelInput.placeholder = 'Failed to load models — refresh page'; } } } ``` The connection-status badge in the header already polls every 30 s and self-heals, but the model dropdown only fetches once on page load — so it currently never recovers. This change brings the dropdown's resilience in line with the badge's. ## Acceptance criteria - [ ] When `loadChatModels()`'s first attempt throws, the placeholder updates to `Failed to load models — retrying…` and a second attempt fires after 2 s. - [ ] If the second attempt succeeds, the model dropdown populates normally and the placeholder turns into `Search model...`. - [ ] If the second attempt also fails, the placeholder becomes `Failed to load models — refresh page` and there are no further automatic retries. - [ ] No behavioural change on the success path. - [ ] No other files in the admin crate are modified.
rawdaGastan added this to the ACTIVE project 2026-06-15 08:25:02 +00:00
Author
Member

Test Results

Verified the fix in a headless browser against a live hero_aibroker_admin process serving the patched template, on branch fix/admin-ui-model-load-retry.

Build

  • cargo build -p hero_aibroker_admin — clean (Finished dev profile in 5.47s).
  • Admin restarted with the new binary; HTML served via TCP→UDS bridge at http://127.0.0.1:8765/. HTTP 200.

1. Happy path (regression check)

Loaded the page normally (RPC working). Inspected the chat model input:

{
  "value": "claude-haiku",
  "placeholder": "Search model...",
  "chatModelsListLen": 34
}
  • Existing success path unaffected.
  • chatModelInput.placeholder ends at Search model... exactly as before the fix.
  • 34 chat models loaded (matches --fake registry capability filter).

2. Forced both-fail recovery (the bug scenario)

Overrode window.rpc to throw on every call. Re-fired loadChatModels() from a fresh Loading models... state, then snapshotted the placeholder + cumulative rpc-call counter at fixed offsets:

t (ms) placeholder rpc calls
0 Failed to load models — retrying… 2
100 Failed to load models — retrying… 2
1000 Failed to load models — retrying… 2
2200 Failed to load models — refresh page 4
3000 Failed to load models — refresh page 4

Interpretation:

  • First attempt failed → placeholder transitioned to Failed to load models — retrying… immediately (acceptance criterion 1).
  • Promise.all([rpc('models.config'), rpc('models.list')]) accounts for 2 rpc calls per attempt; the counter going 2 → 4 between t=1000 ms and t=2200 ms confirms the scheduled retry fired at ~2 s.
  • Second failure switched the placeholder to the terminal Failed to load models — refresh page (acceptance criterion 3).
  • Counter stayed at 4 through t=3000 ms — no third attempt, as required.

3. Restored-rpc check (retry path that succeeds)

After the both-fail test, restored window.rpc to the working implementation and called loadChatModels() again from a Loading models... state. The chat model input ended at:

{
  "value": "claude-haiku",
  "placeholder": "Search model...",
  "chatModelsListLen": 34
}

Confirms that when an attempt succeeds (either the first or the retry), the existing success path runs and the placeholder lands at Search model... — acceptance criterion 2 holds.

Acceptance-criteria status

  • First attempt failure: placeholder Failed to load models — retrying…, retry scheduled at 2 s.
  • Retry succeeds: dropdown populated, placeholder Search model....
  • Both fail: placeholder Failed to load models — refresh page, no further retries.
  • No regression on the success path.
  • No other files modified — single-file diff in crates/hero_aibroker_admin/templates/fragments/chat_pane.html (+12 / -3).

PR

#155 fix(admin-ui): retry model load once on transient RPC failure (#153) — targeting main.

## Test Results Verified the fix in a headless browser against a live `hero_aibroker_admin` process serving the patched template, on branch `fix/admin-ui-model-load-retry`. ### Build - `cargo build -p hero_aibroker_admin` — clean (`Finished dev profile in 5.47s`). - Admin restarted with the new binary; HTML served via TCP→UDS bridge at `http://127.0.0.1:8765/`. HTTP 200. ### 1. Happy path (regression check) Loaded the page normally (RPC working). Inspected the chat model input: ```json { "value": "claude-haiku", "placeholder": "Search model...", "chatModelsListLen": 34 } ``` - Existing success path unaffected. - `chatModelInput.placeholder` ends at `Search model...` exactly as before the fix. - 34 chat models loaded (matches `--fake` registry capability filter). ### 2. Forced both-fail recovery (the bug scenario) Overrode `window.rpc` to throw on every call. Re-fired `loadChatModels()` from a fresh `Loading models...` state, then snapshotted the placeholder + cumulative rpc-call counter at fixed offsets: | t (ms) | placeholder | rpc calls | |---|---|---| | 0 | `Failed to load models — retrying…` | 2 | | 100 | `Failed to load models — retrying…` | 2 | | 1000 | `Failed to load models — retrying…` | 2 | | 2200 | `Failed to load models — refresh page` | 4 | | 3000 | `Failed to load models — refresh page` | 4 | Interpretation: - First attempt failed → placeholder transitioned to `Failed to load models — retrying…` immediately (acceptance criterion 1). - `Promise.all([rpc('models.config'), rpc('models.list')])` accounts for 2 rpc calls per attempt; the counter going 2 → 4 between t=1000 ms and t=2200 ms confirms the scheduled retry fired at ~2 s. - Second failure switched the placeholder to the terminal `Failed to load models — refresh page` (acceptance criterion 3). - Counter stayed at 4 through t=3000 ms — no third attempt, as required. ### 3. Restored-rpc check (retry path that succeeds) After the both-fail test, restored `window.rpc` to the working implementation and called `loadChatModels()` again from a `Loading models...` state. The chat model input ended at: ```json { "value": "claude-haiku", "placeholder": "Search model...", "chatModelsListLen": 34 } ``` Confirms that when an attempt succeeds (either the first or the retry), the existing success path runs and the placeholder lands at `Search model...` — acceptance criterion 2 holds. ### Acceptance-criteria status - [x] First attempt failure: placeholder `Failed to load models — retrying…`, retry scheduled at 2 s. - [x] Retry succeeds: dropdown populated, placeholder `Search model...`. - [x] Both fail: placeholder `Failed to load models — refresh page`, no further retries. - [x] No regression on the success path. - [x] No other files modified — single-file diff in `crates/hero_aibroker_admin/templates/fragments/chat_pane.html` (+12 / -3). ### PR [#155 fix(admin-ui): retry model load once on transient RPC failure (#153)](https://forge.ourworld.tf/lhumina_code/hero_aibroker/pulls/155) — targeting `main`.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_aibroker#153
No description provided.