Keyspace fragmentation: KEYS / DBSIZE / EXISTS / SCAN ignore hash, list & set keys #39

Open
opened 2026-06-10 12:49:06 +00:00 by sameh-farouk · 1 comment
Member

Summary

KEYS, DBSIZE, EXISTS, and SCAN only see the string keyspace. Keys holding hashes, lists, or sets are invisible to keyspace enumeration and existence checks, even though they exist and are accessible by their own type commands.

Reproduction (passes on redis 7.0.15, fails on hero_db)

SET k:str v ; HSET k:hash f v ; RPUSH k:list a ; SADD k:set m
hero_db:  DBSIZE -> 1     KEYS * -> ["k:str"]     EXISTS k:hash -> 0   SCAN -> ["k:str"]
redis:    DBSIZE -> 4     KEYS * -> all four      EXISTS k:hash -> 1   SCAN -> all four

Root cause

Storage is a single flat redb table (crates/hero_db/src/stor/redb.rs, table "data") with per-type key prefixes H: / L: / S: / X: (see crates/hero_db/src/db/engine/types.rs). The keyspace ops filter those out or only look at the bare key:

  • keys()crates/hero_db/src/db/engine/keyspace.rs:39-48 explicitly continues past H:/L:/S: keys (streams are re-surfaced via :_meta, hash/list/set are not).
  • dbsize()keyspace.rs:196-200 counts only keys without an H:/L:/S:/TTL: prefix.
  • exists()keyspace.rs:82 does a point-get on the bare key only.

key_type() and del() (keyspace.rs:146-172, :85-144) do walk all prefixes, so TYPE and DEL are already correct — the fix is to bring KEYS/DBSIZE/EXISTS/SCAN in line with them.

Impact

Breaks any client, tool, or admin UI that enumerates or probes the keyspace. EXISTS returning 0 for a live key is especially dangerous for existence-guarded logic.

Secondary

DBSIZE also over-counts streams — each X:/XG:/XC:/XP: sub-key is counted as a separate key.

Shares a root cause (flat prefixed keyspace, no per-key type index) with the type-safety issue #40.


Filed from a Redis-compatibility audit (hero_db v0.6.0 @ main aacaad1). Every finding was cross-validated: the same probe passes on stock redis-server 7.0.15 and fails on hero_db, using the Apache Kvrocks gocase suite (Go) and a redis-py 8.0 probe (Python). Root causes verified against the source.

## Summary `KEYS`, `DBSIZE`, `EXISTS`, and `SCAN` only see the **string** keyspace. Keys holding hashes, lists, or sets are invisible to keyspace enumeration and existence checks, even though they exist and are accessible by their own type commands. ## Reproduction (passes on redis 7.0.15, fails on hero_db) ``` SET k:str v ; HSET k:hash f v ; RPUSH k:list a ; SADD k:set m hero_db: DBSIZE -> 1 KEYS * -> ["k:str"] EXISTS k:hash -> 0 SCAN -> ["k:str"] redis: DBSIZE -> 4 KEYS * -> all four EXISTS k:hash -> 1 SCAN -> all four ``` ## Root cause Storage is a single flat redb table (`crates/hero_db/src/stor/redb.rs`, table `"data"`) with per-type key prefixes `H:` / `L:` / `S:` / `X:` (see `crates/hero_db/src/db/engine/types.rs`). The keyspace ops filter those out or only look at the bare key: - `keys()` — `crates/hero_db/src/db/engine/keyspace.rs:39-48` explicitly `continue`s past `H:`/`L:`/`S:` keys (streams are re-surfaced via `:_meta`, hash/list/set are not). - `dbsize()` — `keyspace.rs:196-200` counts only keys without an `H:`/`L:`/`S:`/`TTL:` prefix. - `exists()` — `keyspace.rs:82` does a point-get on the bare key only. `key_type()` and `del()` (`keyspace.rs:146-172`, `:85-144`) **do** walk all prefixes, so `TYPE` and `DEL` are already correct — the fix is to bring KEYS/DBSIZE/EXISTS/SCAN in line with them. ## Impact Breaks any client, tool, or admin UI that enumerates or probes the keyspace. `EXISTS` returning 0 for a live key is especially dangerous for existence-guarded logic. ## Secondary `DBSIZE` also **over-counts streams** — each `X:`/`XG:`/`XC:`/`XP:` sub-key is counted as a separate key. Shares a root cause (flat prefixed keyspace, no per-key type index) with the type-safety issue #40. --- *Filed from a Redis-compatibility audit (hero_db v0.6.0 @ main `aacaad1`). Every finding was cross-validated: the same probe **passes on stock `redis-server 7.0.15`** and **fails on hero_db**, using the Apache Kvrocks `gocase` suite (Go) and a `redis-py 8.0` probe (Python). Root causes verified against the source.*
sameh-farouk changed title from __scope_probe__ to Keyspace fragmentation: KEYS / DBSIZE / EXISTS / SCAN ignore hash, list & set keys 2026-06-10 12:49:37 +00:00
Author
Member

Org-wide consumer blast-radius audit (#39 + #40)

Scanned all 132 non-empty repos across 6 orgs (lhumina_code, geomind_code, geomind_research, ourworld_it, ourworld_org, projectmycelium) — remote shallow-clone of each default branch, then classified hero_db references and drilled into the data consumers.

Who actually consumes hero_db (data path)

Repo How it uses hero_db Calls KEYS/SCAN/EXISTS/DBSIZE? Reuses a key across types? Persists data?
hero_lib_rhai HeroDbClient; exposes generic herodb_keys/herodb_exists/get/set/… to Rhai scripts herodb_keys()+herodb_exists() (clients_rhai/src/herodb.rs:480,491) only string ops exposed via scripts
hero_wallet HeroDbClient; set/get/del (string) + sadd/smembers (set), disjoint key namespaces no no (clean typed design, storage.rs:100-147) wallets/accounts (sensitive)
hero_aibroker RESP/RPC redis.get/set/del/hset/hgetall/sadd/smembers on distinct keys + database.create no no model/provider config
hero_slides redis.hset/hget/hgetall/hdel on one fixed hash key no no slide collections
hero_memory HeroDBServerClient for ontology.*/graph (its own scan/keys are NOT hero_db's) no (hero_db side) typed ontology/memory
hero_bundle integration-test only (ping via both clients) no no test fixture
hero_router / hero_proc / hero_coordinator / ourworld_framework route socket / supervise / 6378 config n/a n/a n/a
hero_osis references the old flat socket hero_db_*.sock (stale, #34-class) n/a n/a n/a

(The ~25 raw "risky" grep hits were false positives — Path::exists() and "keys" in OpenRPC JSON schemas.)

#39 blast radius (KEYS/DBSIZE/EXISTS/SCAN see all types) — narrow

  • 🟠 hero_lib_rhai is the only real exposure: herodb_keys() / herodb_exists() are exposed to Rhai scripts. Today they see string keys only; after #39 they'd also surface hash/list/set keys (relevant if the instance is shared with hero_slides/wallet/aibroker). A script that does enumerate-keys → GET each could get non-string keys back (nil, or WRONGTYPE once #40 lands). Regression-check the Rhai scripts.
  • No other consumer calls KEYS/SCAN/DBSIZE/EXISTS on hero_db (hero_memory's are its own API; hero_db's CLI uses them internally only). EXISTS flipping to true for live hash/list/set keys is a correctness fix nobody currently depends on.

#40 blast radius (type-safety / WRONGTYPE + possible type index) — two risks

  1. 🟢 WRONGTYPE enforcement — low break risk. Every data consumer uses disjoint key namespaces per type (verified: hero_wallet {prefix}{id} vs index_key; hero_aibroker set/hash/string on distinct keys; hero_slides single hash key). None reuses a key as two types. Only wildcard is hero_lib_rhai's generic scripting surface — but it exposes string ops only, so it can't create conflicting types itself.
  2. 🔴 Migration / on-disk format — the real danger. If #40 introduces a per-key type index or changes key layout, every persisted hero_db instance needs migration. Affected data holders: hero_wallet (sensitive), hero_aibroker, hero_slides, hero_memory (+ any prod hero_db). A format change without a read-compat path = data loss. (hero_wallet's sensitive data also rides on hero_db's hardcoded encryption key — #46.)

Recommendations

  1. Ship #39 standalone, storage-unchanged — implement by aggregating prefix scans in KEYS/DBSIZE/EXISTS/SCAN (the way TYPE/DEL already walk all prefixes). No on-disk change, zero migration risk. Only gate: a regression pass on hero_lib_rhai / Rhai scripts that enumerate keys.
  2. For #40, prefer deriving type from existing prefixes (read-back-compatible) over a new on-disk index — or include a migration and a coordinated, same-branch-name release with hero_wallet / hero_aibroker / hero_slides / hero_memory. SDK signatures don't change (return types unchanged; WRONGTYPE just adds an error variant), so no SDK regen is forced on the 5 cargo_dep consumers.

Audit method: org-wide remote scan of 132 repos / 6 orgs (not just locally-cloned repos), per the cross-org blast-radius convention.

## Org-wide consumer blast-radius audit (#39 + #40) Scanned **all 132 non-empty repos across 6 orgs** (`lhumina_code`, `geomind_code`, `geomind_research`, `ourworld_it`, `ourworld_org`, `projectmycelium`) — remote shallow-clone of each default branch, then classified hero_db references and drilled into the data consumers. ### Who actually consumes hero_db (data path) | Repo | How it uses hero_db | Calls KEYS/SCAN/EXISTS/DBSIZE? | Reuses a key across types? | Persists data? | |---|---|---|---|---| | **hero_lib_rhai** | `HeroDbClient`; exposes generic `herodb_keys`/`herodb_exists`/`get`/`set`/… to **Rhai scripts** | ✅ `herodb_keys()`+`herodb_exists()` (`clients_rhai/src/herodb.rs:480,491`) | only string ops exposed | via scripts | | **hero_wallet** | `HeroDbClient`; `set/get/del` (string) + `sadd/smembers` (set), disjoint key namespaces | no | **no** (clean typed design, `storage.rs:100-147`) | ✅ wallets/accounts (sensitive) | | **hero_aibroker** | RESP/RPC `redis.get/set/del/hset/hgetall/sadd/smembers` on distinct keys + `database.create` | no | no | ✅ model/provider config | | **hero_slides** | `redis.hset/hget/hgetall/hdel` on one fixed hash key | no | no | ✅ slide collections | | **hero_memory** | `HeroDBServerClient` for `ontology.*`/graph (its own `scan`/`keys` are NOT hero_db's) | no (hero_db side) | typed | ✅ ontology/memory | | **hero_bundle** | integration-test only (ping via both clients) | no | no | test fixture | | hero_router / hero_proc / hero_coordinator / ourworld_framework | route socket / supervise / `6378` config | n/a | n/a | n/a | | **hero_osis** | references the **old flat socket** `hero_db_*.sock` (stale, #34-class) | n/a | n/a | n/a | (The ~25 raw "risky" grep hits were false positives — `Path::exists()` and `"keys"` in OpenRPC JSON schemas.) ### #39 blast radius (KEYS/DBSIZE/EXISTS/SCAN see all types) — narrow - 🟠 **hero_lib_rhai is the only real exposure**: `herodb_keys()` / `herodb_exists()` are exposed to Rhai scripts. Today they see string keys only; after #39 they'd also surface hash/list/set keys (relevant if the instance is shared with hero_slides/wallet/aibroker). A script that does *enumerate-keys → GET each* could get non-string keys back (nil, or `WRONGTYPE` once #40 lands). **Regression-check the Rhai scripts.** - ✅ No other consumer calls KEYS/SCAN/DBSIZE/EXISTS on hero_db (hero_memory's are its own API; hero_db's CLI uses them internally only). `EXISTS` flipping to `true` for live hash/list/set keys is a correctness fix nobody currently depends on. ### #40 blast radius (type-safety / WRONGTYPE + possible type index) — two risks 1. 🟢 **WRONGTYPE enforcement — low break risk.** Every data consumer uses **disjoint key namespaces per type** (verified: hero_wallet `{prefix}{id}` vs `index_key`; hero_aibroker set/hash/string on distinct keys; hero_slides single hash key). None reuses a key as two types. Only wildcard is hero_lib_rhai's generic scripting surface — but it exposes string ops only, so it can't create conflicting types itself. 2. 🔴 **Migration / on-disk format — the real danger.** If #40 introduces a per-key type index or changes key layout, **every persisted hero_db instance needs migration.** Affected data holders: **hero_wallet (sensitive), hero_aibroker, hero_slides, hero_memory** (+ any prod hero_db). A format change without a read-compat path = data loss. (hero_wallet's sensitive data also rides on hero_db's hardcoded encryption key — #46.) ### Recommendations 1. **Ship #39 standalone, storage-unchanged** — implement by aggregating prefix scans in KEYS/DBSIZE/EXISTS/SCAN (the way `TYPE`/`DEL` already walk all prefixes). No on-disk change, **zero migration risk**. Only gate: a regression pass on **hero_lib_rhai** / Rhai scripts that enumerate keys. 2. **For #40, prefer deriving type from existing prefixes (read-back-compatible) over a new on-disk index** — or include a migration and a **coordinated, same-branch-name release** with hero_wallet / hero_aibroker / hero_slides / hero_memory. SDK signatures don't change (return types unchanged; WRONGTYPE just adds an error variant), so no SDK regen is forced on the 5 `cargo_dep` consumers. *Audit method: org-wide remote scan of 132 repos / 6 orgs (not just locally-cloned repos), per the cross-org blast-radius convention.*
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lhumina_code/hero_db#39
No description provided.