No description

Find a file

mik-tf 8ee84c39d9 All checks were successful Test / test (push) Successful in 3m34s Details fix(ci): move [patch] overrides from Cargo.toml to .cargo/config.toml The [patch] section in Cargo.toml referenced local paths (../hero_lib/packages/*) that only exist in the developer workspace but not in CI. This caused all CI builds to fail with 'No such file or directory' for herolib-ai. Move the path overrides to .cargo/config.toml which is already gitignored, so local development keeps working while CI uses the git dependencies.		2026-02-08 08:30:06 -05:00
.cargo	fix: unblock demos, CI correctness, Makefile cleanup	2026-02-06 12:51:17 -05:00
.claude	chore: Implement Phase 1 & 3 of OpenRPC refactoring - API spec and client library	2026-01-25 10:32:56 +01:00
.forgejo	Merge branch 'development-libraries-work' into development	2026-02-06 22:02:17 +04:00
_archive	started removing heroredis	2026-02-06 23:01:38 +04:00
_beta	started removing heroredis	2026-02-06 23:01:38 +04:00
crates/books-client	Apply clippy style fixes across codebase	2026-02-06 15:17:17 -05:00
docs/schemas	started removing heroredis	2026-02-06 23:01:38 +04:00
examples	remove unused examples/ebooks_test (duplicate of ebooks_local)	2026-02-08 08:25:31 -05:00
scripts	Remove .env loading, adopt source workflow per env_secrets skill	2026-02-07 15:03:11 -05:00
sdk/js	Add ontology module with Rhai bindings and heroatlas binary	2026-01-21 05:43:41 +01:00
specs	examples	2026-02-07 23:34:59 +04:00
src	feat: add topic & book filters to library search results	2026-02-08 07:59:29 -05:00
templates	remove 'click to filter' hint from topic filters	2026-02-08 08:06:57 -05:00
tosort	refactor: reorganize ports to 33xx (backend) / 88xx (frontend) scheme	2026-02-01 11:23:01 -05:00
.env.example	Remove .env loading, adopt source workflow per env_secrets skill	2026-02-07 15:03:11 -05:00
.gitignore	gitignore: ignore .ai/ generated metadata directories	2026-02-07 18:49:33 -05:00
build.rs	Add ontology module with Rhai bindings and heroatlas binary	2026-01-21 05:43:41 +01:00
build.sh	refactor: reorganize ports to 33xx (backend) / 88xx (frontend) scheme	2026-02-01 11:23:01 -05:00
buildenv.sh	fix: Add missing BUILD_LIB and buildenv configuration	2026-02-08 11:54:37 +04:00
Cargo.lock	fix(ci): move [patch] overrides from Cargo.toml to .cargo/config.toml	2026-02-08 08:30:06 -05:00
Cargo.toml	fix(ci): move [patch] overrides from Cargo.toml to .cargo/config.toml	2026-02-08 08:30:06 -05:00
install.sh	refactor: Rename project from Atlas to Hero Books	2026-01-26 09:27:29 +01:00
LICENSE	Initial commit: AtlasServer Rust implementation	2026-01-19 09:05:23 +01:00
Makefile	rename examples/ebooks to ebooks_git, update Makefile run-all to run-git	2026-02-08 08:24:38 -05:00
publish.sh	Initial commit: AtlasServer Rust implementation	2026-01-19 09:05:23 +01:00
README.md	rename examples/ebooks to ebooks_git, update Makefile run-all to run-git	2026-02-08 08:24:38 -05:00
run.sh	refactor: reorganize ports to 33xx (backend) / 88xx (frontend) scheme	2026-02-01 11:23:01 -05:00
run_slides.sh	Initial commit: AtlasServer Rust implementation	2026-01-19 09:05:23 +01:00
rust-toolchain.toml	chore: pin rust toolchain to 1.92.0 with musl target	2026-02-01 12:58:35 -05:00
TODO.md	fix: unblock demos, CI correctness, Makefile cleanup	2026-02-06 12:51:17 -05:00
VISION.md	Add VISION.md: Phase 2+ roadmap (oschema, Dioxus, MCP, skills)	2026-02-07 20:31:18 -05:00
WEBSITE.md	Initial commit: AtlasServer Rust implementation	2026-01-19 09:05:23 +01:00

README.md

Hero Books - Document Management System

A Rust-based document collection management system with CLI, library, and web interfaces for processing markdown-based documentation with support for cross-collection references, link validation, and export to self-contained directories.

# Source your environment variables
source ~/.config/env.sh   # or wherever you keep your secrets

# Build and run the web server
make run

# See all available commands
make help

Features

Collection scanning: Automatically discover collections marked with .collection files
Cross-collection references: Link between pages in different collections using collection:page syntax
Include directives: Embed content from other pages with !!include collection:page
Link validation: Detect broken links to pages, images, and files
Export: Generate self-contained directories with all dependencies
Access control: Group-based ACL via .group files
Git integration: Automatically detect repository URLs

Installation

Install from Binaries

Download the pre-built binary from the Forge package registry:

mkdir -p ~/hero/bin
curl -fsSL -o ~/hero/bin/hero_books \
  "https://forge.ourworld.tf/api/packages/lhumina_code/generic/hero_books/dev/hero_books-linux-amd64"
chmod +x ~/hero/bin/hero_books

Build from Source

git clone https://forge.ourworld.tf/lhumina_code/hero_books
cd hero_books
make build

Binaries will be at:

target/release/books_client - CLI client
target/release/books_server - Web server

Install to ~/hero/bin/:

make install

Run Different Documentation Sets

Run different documentation sets with isolated embeddings:

make run              # Local books (7 books, 3 libraries, fast/offline)
make run-git          # Git-based books (real content from forge repos)
make stop             # Stop all services

Architecture & Concepts

Separation of Concerns

Hero Books separates content from presentation:

DocTree (Content):

Manages markdown collections and pages
Validates links and references
Tracks files and images
Processes include directives
Enforces access control

Website (Presentation):

Defines navigation structure
Configures sidebars and menus
Manages theming and styling
Handles SEO metadata
Provides plugin architecture

This separation allows flexible website layouts without changing content.

Key Concepts

Collections: Directories of markdown pages marked with .collection file

Each collection is independently managed
Collections can reference each other
Access control per collection via ACL files

Pages: Individual markdown files with:

Extracted title (from H1 heading)
Description (from first paragraph)
Parsed internal links and includes
Optional front matter metadata

Links: References to pages, images, or files:

Same collection: [text](page_name)
Cross-collection: [text](collection:page)
External: Automatic detection of HTTP(S) URLs
Images: Identified by extension

Groups: Access control lists defining user membership

Grant read/write access to collections
Support wildcards for email patterns
Support group inclusion (nested groups)

Export: Self-contained read-only directory:

Pages and files organized by collection
JSON metadata for each collection
Suitable for static hosting or archival

Data Flow

Directory Scan
    ↓
Find Collections (.collection files)
    ↓
Parse Pages (extract metadata, parse links)
    ↓
Validate Links (check references exist)
    ↓
Process Includes (expand !!include directives)
    ↓
Enforce ACL (check group membership)
    ↓
Export (write to structured directory)
    ↓
Read Client (query exported collections)

Environment Variables

This project follows the env_secrets convention. Source your env file before running — no .env files, no dotenv crates.

source ~/.config/env.sh   # or wherever you keep your secrets
make run

Variables used by Hero Books

Variable	Required	Purpose
`GROQ_API_KEY`	Yes	Groq API — Q&A extraction, AI summary, Whisper transcription (get one)
`OPENROUTER_API_KEY`	Yes	OpenRouter API — LLM fallback, embeddings (get one)
`SAMBANOVA_API_KEY`	No	SambaNova API — additional LLM provider (get one)
`GIT_TOKEN`	No	Personal access token for cloning private repos from Forge
`HERO_EMBEDDER_URL`	No	Override local embedder endpoint (default: `http://localhost:3752/rpc`)

All variable names are canonical across the ecosystem — see the env_secrets skill for the full registry.

Git Authentication (for private repos)

For private repositories on forge.ourworld.tf, set GIT_TOKEN in your env file (personal access token from Forge: Settings > Applications > Generate Token).

Alternatively, use SSH keys — ensure your key is loaded in ssh-agent.

CLI Usage

The books_client CLI talks to a running books_server via OpenRPC.

Start the server first

books_server --port 8883 --books-dir /path/to/docs

Scan for collections

# Scan a local path (server must have access)
books_client scan --path /path/to/docs

# Scan from git repository
books_client scan --git-url https://github.com/user/docs.git

List and inspect collections

# List all collections
books_client list

# Get collection details
books_client get my-collection

# Get all pages in a collection
books_client get-pages my-collection

# Get a specific page
books_client get-page my-collection page-name

Process collections

# Process for Q&A extraction and embeddings
books_client process my-collection

# Force reprocessing
books_client process my-collection --force

Metadata management

# Get collection metadata
books_client get-metadata my-collection

# Set collection metadata
books_client set-metadata my-collection --json '{"key": "value"}'

Server health

# Check server health
books_client health

# View OpenRPC schema
books_client discover

Directory Structure

Source Structure

docs/
├── collection1/
│   ├── .collection           # Marks as collection (optional: name:custom_name)
│   ├── read.acl              # Optional: group names for read access
│   ├── write.acl             # Optional: group names for write access
│   ├── page1.md
│   ├── subdir/
│   │   └── page2.md
│   └── img/
│       └── logo.png
├── collection2/
│   ├── .collection
│   └── intro.md
└── groups/                   # Special collection for ACL groups
    ├── .collection
    ├── admins.group
    └── editors.group

Export Structure

/tmp/books/
├── content/
│   └── collection_name/
│       ├── page1.md          # Pages at root of collection dir
│       ├── page2.md
│       ├── img/              # All images in img/ subdirectory
│       │   └── logo.png
│       └── files/            # All other files in files/ subdirectory
│           └── document.pdf
└── meta/
    └── collection_name.json  # Collection metadata

File Formats

.collection

name:custom_collection_name

If empty or name not specified, uses directory name.

.group

// Comments start with //
user@example.com
*@company.com
include:other_group

ACL files (read.acl, write.acl)

admins
editors

One group name per line.

Link Syntax

Page links

[text](page_name)           # Same collection
[text](collection:page)     # Cross-collection

Image links

![alt](img/image.png)              # Same collection
![alt](collection:img/image.png)   # Cross-collection

Include directives

!!include page_name
!!include collection:page_name

Name Normalization

Page and collection names are normalized:

Convert to lowercase
Replace - with _
Replace / with _
Remove .md extension
Strip numeric prefix (e.g., 03_page → page)
Remove special characters

Supported Image Extensions

.png, .jpg, .jpeg, .gif, .svg, .webp, .bmp, .tiff, .ico

Service Management with Zinit

Hero Books can be registered and managed as a Zinit-managed service with automatic restart, health checks, and port management.

Starting as a Zinit Service

# Start web server as Zinit-managed service
books_server --port 8883 --start

# Start with custom books directory
books_server --port 8883 --books-dir ./books --start

# Multi-instance support
books_server --port 8883 --start --instance prod
books_server --port 9568 --start --instance dev

Service Management

Once started with --start, services are managed by Zinit:

# View service status
zinit status books_server

# View service logs
zinit logs books_server

# Stop service
zinit stop books_server

# Restart service
zinit restart books_server

# Multi-instance commands
zinit status books_server_prod
zinit logs books_server_dev

Service Features

Automatic Restart: Service restarts on failure with 5s delay
Health Checks: TCP port health checks every 10s
Max Restarts: Up to 5 restart attempts before stopping
Logging: Full log history available via zinit logs
Verification: Defensive self-test to verify successful startup

Error Handling

If service startup fails, Zinit will:

Attempt TCP connection to verify port binding
Check service state and PID
Display recent logs on failure
Clean up failed service registration

Detailed error messages provide diagnostic information:

Port already in use
Binary path incorrect
Zinit server not running
Permission denied

Development

Code Quality

This project maintains high code quality standards:

Dead Code Cleanup: Unused code is either removed or marked with #[allow(dead_code)] with clear justification:
- flatten_chapter_pages() - Utility function kept for testing
- classify_topic() - Public API method reserved for future use
- embeddings_from_cache - Field used for statistics reporting
Compiler Warnings: All compiler warnings in the hero_books crate are resolved (external dependencies only)

Building

# Build release binaries
make build

# Build with debug info (dev mode)
cargo build

# Run tests
make test

# Run all tests including integration tests
make test-all

# Generate documentation
cargo doc --no-deps --open

# Check for compiler warnings
cargo check

Testing

# Run all tests
cargo test

# Run specific module tests
cargo test doctree
cargo test website

# Run with output
cargo test -- --nocapture

Code Organization

src/
├── lib.rs                 # Library exports
├── main.rs               # Web server entry point (books_server binary)
├── bin/
│   ├── books.rs         # CLI entry point (books_client binary)
│   └── cli.rs           # Legacy CLI entry point
├── cli/                  # CLI commands and handlers
├── doctree/              # Document management
├── ebook/                # Ebook parsing
├── ontology/             # AI-powered semantic extraction
├── vectorsdk/            # Vector search and embeddings
├── publishing/           # Publishing configuration
├── book/                 # Book and PDF processing
├── web/                  # HTTP API routes and handlers
└── website/              # Website configuration

Adding New Features

New DocTree functionality: Add to src/doctree/
New Website config: Add to src/website/
New CLI commands: Add to src/cli/mod.rs
New API endpoints: Add to src/web/mod.rs

Library Usage

Ontology Processing

The ontology processor uses AI to classify documents and extract semantic concepts/relationships.

use hero_books::ontology::{OntologyProcessor, ProcessorConfig, ONTOLOGIES};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create processor with default config
    let processor = OntologyProcessor::new();

    let document = "Our SaaS product integrates with Slack...";

    // Classification only (quick)
    let matches = processor.classify(document).await?;
    for m in &matches {
        println!("{}: {} (primary: {})", m.topic, m.score, m.is_primary);
    }

    // Full processing (classification + extraction)
    let result = processor.process(document).await?;

    for sem in &result.semantics {
        println!("{}: {} concepts, {} relationships",
            sem.category, sem.concepts.len(), sem.relationships.len());
    }

    // Direct extraction for specific topics
    let semantics = processor.extract(document, &["product", "technology"]).await?;

    Ok(())
}

Available Topics: business, technology, product, commercial, people, news, legal, financial, health, education

Configuration:

let config = ProcessorConfig {
    confidence_threshold: Some(8),    // Min score to consider (default: 7)
    max_topics: Some(3),              // Limit topics processed
    filter_topics: Some(vec!["product".into(), "technology".into()]),
    temperature: Some(0.0),           // LLM temperature
    max_input_tokens: Some(60_000),   // Chunk if larger
    ..Default::default()
};
let processor = OntologyProcessor::with_config(config);

Requirements: Source your env file with at least one API key set:

GROQ_API_KEY (preferred)
SAMBANOVA_API_KEY
OPENROUTER_API_KEY

See examples/src/ontology_processing.rs for a complete example.

DocTree

use doctree::{DocTree, ExportArgs};

fn main() -> doctree::Result<()> {
    // Create and scan
    let mut doctree = DocTree::new("mydocs");
    doctree.scan(Path::new("/path/to/docs"), &[])?;
    doctree.init_post()?;  // Validate links

    // Access pages
    let page = doctree.page_get("collection:page")?;
    let content = page.content()?;

    // Export
    doctree.export(ExportArgs {
        destination: PathBuf::from("/tmp/books"),
        reset: true,
        include: false,
    })?;

    Ok(())
}

DocTreeClient (for reading exports)

use doctree::DocTreeClient;

fn main() -> doctree::Result<()> {
    let client = DocTreeClient::new(Path::new("/tmp/books"))?;

    // List collections
    let collections = client.list_collections()?;

    // Get page content
    let content = client.get_page_content("collection", "page")?;

    // Check existence
    if client.page_exists("collection", "page") {
        println!("Page exists!");
    }

    Ok(())
}

Directory Structure

hero_books/
├── examples/
│   ├── collections/     # Local test collections
│   │   └── hero_books_docs/  # Basic demo collection
│   ├── ebooks/          # Book definitions (TOML)
│   └── rhai/            # Demo scripts
├── src/
│   ├── lib.rs           # Library entry point and module declarations
│   ├── main.rs          # Web server binary entry point
│   ├── bin/
│   │   ├── books.rs     # CLI binary entry point (books_client)
│   │   └── cli.rs       # Legacy CLI entry point
│   ├── cli/             # CLI commands and handlers
│   ├── doctree/         # Document tree management
│   ├── ebook/           # Ebook parsing
│   ├── ontology/        # AI-powered semantic extraction
│   ├── vectorsdk/       # Vector search and embeddings
│   ├── publishing/      # Publishing configuration
│   ├── book/            # Book and PDF processing
│   ├── web/             # HTTP API routes and handlers
│   └── website/         # Website configuration
├── crates/
│   └── books-client/    # Rust client library for the API
├── examples/            # Example code for library usage
├── Cargo.toml           # Package configuration
├── Makefile             # Build automation (make help to see all targets)
├── build.rs             # Build script
├── README.md            # This file
├── openrpc.json         # OpenRPC 1.3.2 API specification
└── target/
    ├── debug/           # Debug builds
    └── release/
        ├── books_client     # CLI binary
        └── books_server     # Web server binary