fix(verify): bound hung verify commands when timeout coreutil is absent #118

Merged
rawan merged 2 commits from fix/verify-command-timeout into integration 2026-06-14 15:04:22 +00:00
Member

#44

What

Bounds hung verify commands even when the timeout coreutil is unavailable, and makes the timeout configurable.

Why

run_shell previously relied on the timeout coreutil; if it was absent it fell back to running sh -c <cmd> with no bound — a hung build could wedge a verification run indefinitely.

Changes

  • In-process watchdog fallback (run_shell_bounded): when timeout is missing, spawn the shell in its own process group, drain stdout/stderr on dedicated threads (so a chatty command can't deadlock on a full pipe), and SIGKILL the whole group on expiry so child build processes die too.
  • Configurable bound via HERO_SHRIMP_VERIFY_TIMEOUT_SECS: a positive integer overrides the 300s default; zero/non-numeric/unset keep the default so a misconfiguration can never remove the bound.
  • Killed commands report non-zero exit (⇒ Fail), keeping runs honestly bounded, and note the kill in stderr.
  • Unrelated: a rustfmt-only fix in pidfile.rs tests (separate commit).

Tests

  • resolve_timeout_secs_reads_env_and_falls_back_on_bad_values — env parsing + fallbacks.
  • in_process_fallback_bounds_a_hung_command — a 30s sleep is killed at ~1s, reports non-zero, notes the timeout in stderr.

🤖 Generated with Claude Code

#44 ## What Bounds hung verify commands even when the `timeout` coreutil is unavailable, and makes the timeout configurable. ## Why `run_shell` previously relied on the `timeout` coreutil; if it was absent it fell back to running `sh -c <cmd>` with **no bound** — a hung build could wedge a verification run indefinitely. ## Changes - **In-process watchdog fallback** (`run_shell_bounded`): when `timeout` is missing, spawn the shell in its own process group, drain stdout/stderr on dedicated threads (so a chatty command can't deadlock on a full pipe), and `SIGKILL` the **whole group** on expiry so child build processes die too. - **Configurable bound** via `HERO_SHRIMP_VERIFY_TIMEOUT_SECS`: a positive integer overrides the 300s default; zero/non-numeric/unset keep the default so a misconfiguration can never *remove* the bound. - Killed commands report non-zero exit (⇒ Fail), keeping runs honestly bounded, and note the kill in stderr. - Unrelated: a rustfmt-only fix in `pidfile.rs` tests (separate commit). ## Tests - `resolve_timeout_secs_reads_env_and_falls_back_on_bad_values` — env parsing + fallbacks. - `in_process_fallback_bounds_a_hung_command` — a 30s sleep is killed at ~1s, reports non-zero, notes the timeout in stderr. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Add an in-process watchdog fallback to run_shell: when the `timeout`
coreutil is missing, spawn the shell in its own process group, drain
stdout/stderr on threads, and SIGKILL the whole group on expiry so a hung
build subprocess dies too. Make the bound configurable via
HERO_SHRIMP_VERIFY_TIMEOUT_SECS (positive int wins; zero/garbage keep the
300s default so a misconfig can never remove the bound). Add unit tests
for env resolution and the in-process kill path.
rawan merged commit 95d4624aff into integration 2026-06-14 15:04:22 +00:00
rawan deleted branch fix/verify-command-timeout 2026-06-14 15:04:23 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_shrimp!118
No description provided.