[P1] Single-process streaming is fragile — global stream target, remount flicker, flush races #37

Open
opened 2026-05-23 21:52:23 +00:00 by thabeta · 3 comments
Owner

Problem
The live-streaming layer relies on a single global currentStreamingId, which is nulled the instant the dispatch RPC returns (so background-job tokens get dropped), plus messages is a createSignal<Msg[]> so any setMessages remounts the LiveJobPanel (the "card keeps refreshing" flicker), plus a 60ms flush timer that can clobber the final reply. We've band-aided each of these repeatedly this session.

Evidence

  • crates/hero_shrimp_web/ui/src/store.tscurrentStreamingId, jobStreamJobId, flushStreamBuffer, trackJobUntilDone (non-terminal must NOT setMessages, by comment).

Proposed fix
Re-architect to a per-session typed event channel (cf. kimi-cli's "wire" SPMC protocol / Qwen-Code's typed event generator): events scoped by session, UI subscribes per conversation, message body updated via fine-grained store paths (not array replacement). Depends on the session_id fix.

Related: this is the root behind #29 and #31.


Filed from a comparative audit of Hero Shrimp vs Qwen-Code / kimi-cli / picoclaw (2026-05-23). Severity in title: P0=correctness/trust, P1=reliability/UX, P2=cleanup.

**Problem** The live-streaming layer relies on a single global `currentStreamingId`, which is nulled the instant the dispatch RPC returns (so background-job tokens get dropped), plus `messages` is a `createSignal<Msg[]>` so any `setMessages` remounts the `LiveJobPanel` (the "card keeps refreshing" flicker), plus a 60ms flush timer that can clobber the final reply. We've band-aided each of these repeatedly this session. **Evidence** - `crates/hero_shrimp_web/ui/src/store.ts` — `currentStreamingId`, `jobStreamJobId`, `flushStreamBuffer`, `trackJobUntilDone` (non-terminal must NOT setMessages, by comment). **Proposed fix** Re-architect to a per-session typed event channel (cf. kimi-cli's "wire" SPMC protocol / Qwen-Code's typed event generator): events scoped by session, UI subscribes per conversation, message body updated via fine-grained store paths (not array replacement). Depends on the session_id fix. Related: this is the root behind #29 and #31. --- _Filed from a comparative audit of Hero Shrimp vs Qwen-Code / kimi-cli / picoclaw (2026-05-23). Severity in title: P0=correctness/trust, P1=reliability/UX, P2=cleanup._
Member

Implementation Spec for Issue #37

Objective

Migrate the chat thread's messages from a coarse createSignal<Msg[]> (whole-array, by-reference replacement) to a fine-grained SolidJS createStore<Msg[]>, so per-token / per-tool body updates become path mutations (setMessages(i, "text", v)) that no longer replace the message object and therefore no longer remount the LiveJobPanel and other per-message subtrees keyed by reference. This eliminates the root-cause fragility on top of the band-aids already present.

Current state (verified)

  • The three acute symptoms are already band-aided (background-job token routing via jobStreamJobId/jobNarration, non-terminal poll no longer calls setMessages, idempotent flush that SETs from a full buffer, clearStreamBuffer before every finalization, partial session-scoping of SSE events).
  • What remains (the actual debt): messages is still createSignal<Msg[]> (store.ts:253). Every setMessages(m => m.map(...)) allocates new Msg objects, and <For each={messages()}> in ChatThread.tsx keys by object reference, so reactivity stays coarse — any per-turn update re-creates the object for the streaming message even when only text changed. This is the remount-by-reference fragility the issue targets.
  • Note: the live narration path (jobNarration signal) and the throttled flush already AVOID touching the message object for the hot streaming path. The win here is structural correctness/robustness — it removes the entire class of "object replaced -> subtree remounted" failure for the in-thread streaming/tool/finalize paths.

Requirements

  • messages becomes a createStore<Msg[]>; setMessages becomes the store setter (path syntax + produce).
  • All in-thread mutation sites set the narrowest path that changed rather than replacing the whole Msg.
  • All getter call sites stop calling messages() (proxy is non-callable) and read the array directly.
  • <For each={messages}> so each row is identity-stable across body updates and only changed rows update.
  • No behavioral regressions: streaming text, reasoning pane, inline activities, job attach, terminal finalize, error/cancel, workspace divergence, conversation load/replace, restoreActivities, queue/title capture all behave as before.

Files to Modify

  • crates/hero_shrimp_web/ui/src/store.ts — convert the signal to a store; rewrite all 13 setMessages mutation sites and 3 internal messages() reads.
  • crates/hero_shrimp_web/ui/src/components/ChatThread.tsx — change messages() reads (3) and <For each={messages()}>.
  • crates/hero_shrimp_web/ui/src/components/ConversationWork.tsx — change messages().
  • crates/hero_shrimp_web/ui/src/components/TopBar.tsx — change messages().
  • crates/hero_shrimp_web/ui/src/components/LibraryPage.tsx — change messages().

Implementation Plan

Step 1: Convert the declaration

Files: store.ts

  • Add createStore, produce, reconcile imports from solid-js/store.
  • Replace export const [messages, setMessages] = createSignal<Msg[]>([]); with createStore<Msg[]>([]).
  • Keep the same exported names so the diff stays mechanical.
    Dependencies: none

Step 2: Rewrite internal messages() reads (3 sites) in store.ts

Files: store.ts

  • loadConversationMessages guard messages().length -> messages.length.
  • sendMessage title capture messages().filter(...) -> messages.filter(...).
    Dependencies: Step 1

Step 3: Rewrite array-shape setMessages sites (append/replace/clear)

Files: store.ts

  • Conversation load whole-array replace -> setMessages(reconcile(mapped, { key: "id" })) so reopening a conversation keeps row identity (no flash).
  • Append user msg / assistant placeholder -> setMessages(produce(m => { m.push(...) })).
  • newConversation clear -> setMessages([]).
    Dependencies: Step 1

Step 4: Rewrite per-message .map mutation sites to path/produce

Files: store.ts
Convert each setMessages(m => m.map(x => x.id === ID ? {...x, ...} : x)) to a produce that locates the row and sets narrow paths in place. Sites: restoreActivities; trackJobUntilDone terminal finalize; workspace divergence; job-attach; sync finalize; error path; stopStreaming cancel; flushStreamBuffer (hot path — preserve the x.jobId skip + inline-think re-derivation); turn:end SSE finalize (preserve !x.streaming || x.jobId skip and the finalized flag); tool:started/finished/refused SSE (mutate .activities in place).
Dependencies: Step 1

Step 5: Migrate external consumers

Files: ChatThread.tsx, ConversationWork.tsx, TopBar.tsx, LibraryPage.tsx

  • Change messages() -> messages; autoscroll memo must touch messages.length so it still fires on append; <For each={messages()}> -> <For each={messages}>.
    Dependencies: Step 1 (Step 4 for the identity benefit)

Step 6: Build + type-check

Files: crates/hero_shrimp_web/ui

  • Run the UI build/typecheck; grep ui/src for messages( and setMessages( to confirm zero call-style getters remain and every setter uses path/produce/reconcile.
    Dependencies: Steps 1-5

Acceptance Criteria

  • messages is a createStore<Msg[]>; no createSignal<Msg[]> remains.
  • Zero messages() call-style reads remain anywhere in ui/src.
  • All mutating setMessages sites compile against the store setter API; the streaming-body, turn:end, and tool-activity sites mutate fields in place via produce.
  • Conversation reopen uses reconcile(..., { key: "id" }) and does not flash the thread.
  • Streaming reply updates token-by-token without per-message subtrees remounting (flicker gone).
  • Inline tool activity, terminal finalize, error, and cancel all still render correctly.
  • UI build/typecheck passes.

Risk / Scope notes

  • ~20 mechanical edits (13 setter sites + 3 internal reads + 4 external reads). Every setter must be re-expressed — the store setter signature differs from the signal updater.
  • Highest-risk conversions: flushStreamBuffer (hot, must keep the x.jobId skip + inline-think re-derivation), turn:end (must keep the streaming/jobId skip and the finalized -> drainQueue flag), restoreActivities (reads neighbor rows).
  • reconcile({ key: "id" }) is required on the conversation-load site or reopen remounts every row (regression).
  • Frontend-only refactor — no backend/RPC/server changes.
  • Scope boundary: this intentionally does NOT consolidate currentStreamingId + jobStreamJobId into a per-session typed event channel (the issue's broader "wire/SPMC" vision) — that is a larger separate re-architecture. The createStore migration is the smallest high-value change that fixes the remount-by-reference root cause. The issue should NOT be auto-closed on this change alone.

Notes

  • produce lets the existing imperative .find/.push logic be kept almost verbatim inside a mutation draft, minimizing diff risk.
  • The Msg type itself does not need to change.
## Implementation Spec for Issue #37 ### Objective Migrate the chat thread's `messages` from a coarse `createSignal<Msg[]>` (whole-array, by-reference replacement) to a fine-grained SolidJS `createStore<Msg[]>`, so per-token / per-tool body updates become path mutations (`setMessages(i, "text", v)`) that no longer replace the message object and therefore no longer remount the `LiveJobPanel` and other per-message subtrees keyed by reference. This eliminates the root-cause fragility on top of the band-aids already present. ### Current state (verified) - The three acute symptoms are already band-aided (background-job token routing via `jobStreamJobId`/`jobNarration`, non-terminal poll no longer calls `setMessages`, idempotent flush that SETs from a full buffer, `clearStreamBuffer` before every finalization, partial session-scoping of SSE events). - What remains (the actual debt): `messages` is still `createSignal<Msg[]>` (store.ts:253). Every `setMessages(m => m.map(...))` allocates new `Msg` objects, and `<For each={messages()}>` in ChatThread.tsx keys by object reference, so reactivity stays coarse — any per-turn update re-creates the object for the streaming message even when only `text` changed. This is the remount-by-reference fragility the issue targets. - Note: the live narration path (`jobNarration` signal) and the throttled flush already AVOID touching the message object for the hot streaming path. The win here is structural correctness/robustness — it removes the entire class of "object replaced -> subtree remounted" failure for the in-thread streaming/tool/finalize paths. ### Requirements - `messages` becomes a `createStore<Msg[]>`; `setMessages` becomes the store setter (path syntax + `produce`). - All in-thread mutation sites set the narrowest path that changed rather than replacing the whole `Msg`. - All getter call sites stop calling `messages()` (proxy is non-callable) and read the array directly. - `<For each={messages}>` so each row is identity-stable across body updates and only changed rows update. - No behavioral regressions: streaming text, reasoning pane, inline activities, job attach, terminal finalize, error/cancel, workspace divergence, conversation load/replace, restoreActivities, queue/title capture all behave as before. ### Files to Modify - `crates/hero_shrimp_web/ui/src/store.ts` — convert the signal to a store; rewrite all 13 `setMessages` mutation sites and 3 internal `messages()` reads. - `crates/hero_shrimp_web/ui/src/components/ChatThread.tsx` — change `messages()` reads (3) and `<For each={messages()}>`. - `crates/hero_shrimp_web/ui/src/components/ConversationWork.tsx` — change `messages()`. - `crates/hero_shrimp_web/ui/src/components/TopBar.tsx` — change `messages()`. - `crates/hero_shrimp_web/ui/src/components/LibraryPage.tsx` — change `messages()`. ### Implementation Plan #### Step 1: Convert the declaration Files: `store.ts` - Add `createStore`, `produce`, `reconcile` imports from `solid-js/store`. - Replace `export const [messages, setMessages] = createSignal<Msg[]>([]);` with `createStore<Msg[]>([])`. - Keep the same exported names so the diff stays mechanical. Dependencies: none #### Step 2: Rewrite internal `messages()` reads (3 sites) in store.ts Files: `store.ts` - `loadConversationMessages` guard `messages().length` -> `messages.length`. - `sendMessage` title capture `messages().filter(...)` -> `messages.filter(...)`. Dependencies: Step 1 #### Step 3: Rewrite array-shape `setMessages` sites (append/replace/clear) Files: `store.ts` - Conversation load whole-array replace -> `setMessages(reconcile(mapped, { key: "id" }))` so reopening a conversation keeps row identity (no flash). - Append user msg / assistant placeholder -> `setMessages(produce(m => { m.push(...) }))`. - `newConversation` clear -> `setMessages([])`. Dependencies: Step 1 #### Step 4: Rewrite per-message `.map` mutation sites to path/produce Files: `store.ts` Convert each `setMessages(m => m.map(x => x.id === ID ? {...x, ...} : x))` to a `produce` that locates the row and sets narrow paths in place. Sites: restoreActivities; trackJobUntilDone terminal finalize; workspace divergence; job-attach; sync finalize; error path; stopStreaming cancel; flushStreamBuffer (hot path — preserve the `x.jobId` skip + inline-think re-derivation); turn:end SSE finalize (preserve `!x.streaming || x.jobId` skip and the `finalized` flag); tool:started/finished/refused SSE (mutate `.activities` in place). Dependencies: Step 1 #### Step 5: Migrate external consumers Files: `ChatThread.tsx`, `ConversationWork.tsx`, `TopBar.tsx`, `LibraryPage.tsx` - Change `messages()` -> `messages`; autoscroll memo must touch `messages.length` so it still fires on append; `<For each={messages()}>` -> `<For each={messages}>`. Dependencies: Step 1 (Step 4 for the identity benefit) #### Step 6: Build + type-check Files: `crates/hero_shrimp_web/ui` - Run the UI build/typecheck; grep `ui/src` for `messages(` and `setMessages(` to confirm zero call-style getters remain and every setter uses path/produce/reconcile. Dependencies: Steps 1-5 ### Acceptance Criteria - [ ] `messages` is a `createStore<Msg[]>`; no `createSignal<Msg[]>` remains. - [ ] Zero `messages()` call-style reads remain anywhere in `ui/src`. - [ ] All mutating `setMessages` sites compile against the store setter API; the streaming-body, turn:end, and tool-activity sites mutate fields in place via `produce`. - [ ] Conversation reopen uses `reconcile(..., { key: "id" })` and does not flash the thread. - [ ] Streaming reply updates token-by-token without per-message subtrees remounting (flicker gone). - [ ] Inline tool activity, terminal finalize, error, and cancel all still render correctly. - [ ] UI build/typecheck passes. ### Risk / Scope notes - ~20 mechanical edits (13 setter sites + 3 internal reads + 4 external reads). Every setter must be re-expressed — the store setter signature differs from the signal updater. - Highest-risk conversions: `flushStreamBuffer` (hot, must keep the `x.jobId` skip + inline-think re-derivation), `turn:end` (must keep the streaming/jobId skip and the `finalized` -> drainQueue flag), `restoreActivities` (reads neighbor rows). - `reconcile({ key: "id" })` is required on the conversation-load site or reopen remounts every row (regression). - Frontend-only refactor — no backend/RPC/server changes. - Scope boundary: this intentionally does NOT consolidate `currentStreamingId` + `jobStreamJobId` into a per-session typed event channel (the issue's broader "wire/SPMC" vision) — that is a larger separate re-architecture. The createStore migration is the smallest high-value change that fixes the remount-by-reference root cause. The issue should NOT be auto-closed on this change alone. ### Notes - `produce` lets the existing imperative `.find`/`.push` logic be kept almost verbatim inside a mutation draft, minimizing diff risk. - The `Msg` type itself does not need to change.
Member

Test Results

Frontend-only refactor — verified via the project's TypeScript typecheck and the production Vite build.

TypeScript typecheck (tsc --noEmit)

  • Errors before change (baseline on integration): 14
  • Errors after change: 14
  • Errors in changed files (store.ts, ChatThread.tsx, ConversationWork.tsx, TopBar.tsx, LibraryPage.tsx): 0

The 14 errors are pre-existing in unrelated files (ChatActivity, CostBadge, DiffViewer, HeroWelcome) and are unchanged by this work. The migration introduces zero new type errors.

Production build (npm run build — vite build)

  • Result: PASS
  • 101 modules transformed; bundle emitted (app.js 436.24 kB / gzip 134.75 kB).

Grep verification

  • Zero messages() call-style reads remain in ui/src.
  • Zero signal-updater forms (setMessages(m => ...map, spread) remain — every setter now uses produce / reconcile / a plain array.
## Test Results Frontend-only refactor — verified via the project's TypeScript typecheck and the production Vite build. ### TypeScript typecheck (`tsc --noEmit`) - Errors before change (baseline on `integration`): 14 - Errors after change: 14 - Errors in changed files (store.ts, ChatThread.tsx, ConversationWork.tsx, TopBar.tsx, LibraryPage.tsx): **0** The 14 errors are pre-existing in unrelated files (ChatActivity, CostBadge, DiffViewer, HeroWelcome) and are unchanged by this work. The migration introduces zero new type errors. ### Production build (`npm run build` — vite build) - Result: **PASS** - 101 modules transformed; bundle emitted (app.js 436.24 kB / gzip 134.75 kB). ### Grep verification - Zero `messages()` call-style reads remain in `ui/src`. - Zero signal-updater forms (`setMessages(m => ...map`, spread) remain — every setter now uses `produce` / `reconcile` / a plain array.
Member

Pull request opened: #121 (base: integration)

This PR implements the createStore migration discussed in the spec above. Scope note: it fixes the remount-by-reference root cause but does not build the broader per-session typed event channel, so this issue should stay open (downgrade to P2) rather than auto-close.

Pull request opened: https://forge.ourworld.tf/lhumina_code/hero_shrimp/pulls/121 (base: integration) This PR implements the createStore migration discussed in the spec above. Scope note: it fixes the remount-by-reference root cause but does not build the broader per-session typed event channel, so this issue should stay open (downgrade to P2) rather than auto-close.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_shrimp#37
No description provided.