fix(verification): run verify command in deliverables' subdir to stop nested-manifest false-negatives #117
No reviewers
Labels
No labels
prio_critical
prio_low
type_bug
type_contact
type_issue
type_lead
type_question
type_story
type_task
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
lhumina_code/hero_shrimp!117
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "integration_verify_working_dir"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The zero-hardcoding verdict ran the LLM-authored verify command from the workspace root. When the model nested its project in a subdirectory (e.g. a crate in
stackcrate/), a bare root-relative command such ascargo test --testsexited non-zero at root and was recorded as a genuine failure — a false negative (jobs 233/244).This adds an optional, contract-carried
working_dirtoCheck::CommandSucceeds, derived deterministically at freeze time from the common leading subdirectory of the declared deliverables. The command then runs from that subdir. No language, runner, or manifest detection is introduced — the working dir is pure path-prefix inference over paths the author already declared, mirroring the existingfind_file_by_suffixnesting tolerance forFileExists.Related Issue
Closes #43
Changes
check.rs: optionalworking_dir: Option<String>onCheck::CommandSucceedswith#[serde(default, skip_serializing_if = "Option::is_none")](backward-compatible with existing frozen contracts and JSONL ledgers);label()andvalidate_contract()updated, the latter rejecting unsafeworking_dirpaths.runner.rs:command_succeedsresolves the effective cwd through the existingsafe_joinhelper. A path escape returnsUnrunnable; a not-yet-created subdir falls back to the workspace root so a stale guess never produces a false Fail.derive.rs: newcommon_deliverable_subdirhelper populatesworking_dironly when the deliverables share a single unambiguous common subdir; otherwiseNone(root files or split top-level dirs leave behavior unchanged). The verify command itself stays verbatim.run.rs,verdict.rs: test constructors updated for the new field.The verdict stays a pure, replayable function of the frozen contract plus the workspace. The author's
cd sub && ...escape hatch is preserved and still yieldsworking_dir: None.Test Results
cargo test -p hero_shrimp_engine: 1742 passed, 0 failed, 10 ignored. 11 new tests, including a regression that fails at root and passes onceworking_diris set, a path-traversal rejection, a missing-subdir root fallback, and a serde backward-compat check for the legacy on-disk format.4e2f06f976to77865aa356