No description

Find a file

despiegk 6f4c5f393c Some checks failed Build and Test / build (push) Failing after 1m3s Details Build Linux / build-linux (linux-amd64, false, x86_64-unknown-linux-musl) (push) Successful in 1m19s Details Build Linux / build-linux (linux-arm64, true, aarch64-unknown-linux-gnu) (push) Successful in 2m37s Details docs: clarify MCP configuration is LOCAL to each machine - Emphasize MCP servers are stored locally in ~/.claude/mcp.json - Don't sync across machines or to cloud - Work with all Claude clients on THIS MACHINE - Add note that each machine needs separate setup - Update Getting Started to clarify local-only nature - Add step for setting up on other machines Key clarification: ✅ MCP servers work everywhere on THIS machine ❌ Don't sync to other machines ❌ Not stored in cloud ✅ Each machine has its own ~/.claude/mcp.json Examples: - Machine A: claude mcp add ... (configured locally) - Machine B: Need to run command again (separate config) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>		2026-02-08 11:27:15 +04:00
.forgejo/workflows	fix: add rustup target installation for Linux cross-compilation in CI	2026-02-08 09:40:24 +04:00
docs	docs: add comprehensive OpenRPC specification documentation	2026-02-08 09:51:18 +04:00
heroindex	feat: add MCP tab to web admin dashboard	2026-02-08 11:23:02 +04:00
heroindex_client	feat: add tilde expansion and default socket path to client	2026-02-08 09:49:37 +04:00
scripts	build	2026-02-08 09:34:01 +04:00
.gitignore	Update repository URLs to forge.ourworld.tf in READMEs and Cargo.toml metadata	2025-12-25 09:13:23 +01:00
build.sh	Update repository URLs to forge.ourworld.tf in READMEs and Cargo.toml metadata	2025-12-25 09:13:23 +01:00
buildenv.sh	chore: update git remote and owner to lhumina_code organization	2026-02-08 09:17:54 +04:00
Cargo.lock	feat: add in-memory logging system for operations	2026-02-08 09:53:14 +04:00
Cargo.toml	feat: add in-memory logging system for operations	2026-02-08 09:53:14 +04:00
DEVELOPMENT_UI.md	docs: add comprehensive development UI guide and implementation summary	2026-02-08 09:54:30 +04:00
install.sh	Update repository URLs to forge.ourworld.tf in READMEs and Cargo.toml metadata	2025-12-25 09:13:23 +01:00
LICENSE	Update repository URLs to forge.ourworld.tf in READMEs and Cargo.toml metadata	2025-12-25 09:13:23 +01:00
Makefile	feat: add perftest Makefile target	2026-02-08 09:06:55 +04:00
README.md	docs: clarify MCP configuration is LOCAL to each machine	2026-02-08 11:27:15 +04:00
run.sh	Update repository URLs to forge.ourworld.tf in READMEs and Cargo.toml metadata	2025-12-25 09:13:23 +01:00
run_test_data.sh	Update repository URLs to forge.ourworld.tf in READMEs and Cargo.toml metadata	2025-12-25 09:13:23 +01:00
VERSION	feat: add build automation with comprehensive Makefile and build scripts	2026-02-08 09:00:23 +04:00

README.md

HeroIndex

A Tantivy-based full-text search server with OpenRPC socket interface.

Repository: https://forge.ourworld.tf/lhumina_research/hero_index_server

Packages

This workspace contains two packages:

heroindex - The search server binary
heroindex_client - Client library for connecting to the server

Features

Multiple Index Management - Create, delete, and manage multiple Tantivy indexes
Dynamic Schemas - Define custom schemas with various field types
Full-Text Search - Match queries, phrase queries, fuzzy search
Exact Queries - Term queries, range queries, regex, prefix matching
Boolean Queries - Combine queries with must/should/must_not clauses
Fast Fields - Columnar storage for sorting and aggregations
Web Admin UI - Browser-based dashboard for managing databases, queries, and monitoring
HTTP JSON-RPC Endpoint - POST /rpc for HTTP-based JSON-RPC 2.0 clients
MCP Server - Model Context Protocol endpoint at POST /mcp for AI assistant integration
OpenRPC Interface - Unix socket + HTTP JSON-RPC interface with discovery
Performance Test Tab - Built-in benchmark tool for load testing and search benchmarks
Demo Database - Auto-created on first startup with sample documents
Concurrent Connections - Multiple clients can connect simultaneously

Binaries

This project builds two binaries:

Binary	Description
`heroindex`	Full-text search server (main binary)
`heroindex_test`	Test utility for the client library

Installation

From Source (Recommended)

Requirements:

Rust 1.92.0+
Unix-like operating system (Linux, macOS)

git clone https://forge.ourworld.tf/lhumina_research/hero_index_server.git
cd hero_index_server

# Build and install to ~/hero/bin/
make install

Both binaries are now in ~/hero/bin/ and ready to use!

Add to PATH

export PATH="$HOME/hero/bin:$PATH"

Then run anywhere:

heroindex  # Start the server

From crates.io

cargo install heroindex

(Note: For full project with both binaries, build from source)

Development & Usage

Quick Start

# Build and install
make install

# Run the server (uses defaults - no args needed!)
make run

# In another terminal, use the client
heroindex_test  # or use the client library in your code

Make Commands

# Installation & Running
make build         # Build release binaries
make install       # Build and install to ~/hero/bin/
make installdev    # Install debug build (faster compile)
make run           # Run server with defaults
make rundev        # Run with debug logging

# Testing & Quality
make check         # Fast code check
make test          # Run all tests
make test-all      # Run all tests including integration
make perftest      # Run performance benchmark

# Development
make fmt           # Format code
make fmt-check     # Check code formatting
make lint          # Run clippy linter

# Maintenance
make clean         # Remove build artifacts
make all           # Full cycle: clean → check → test → build
make help          # Show all commands

Server Arguments

The server uses sensible defaults - no arguments required!

# Just run it with defaults
make run

# Or after install, run directly
heroindex

Default Configuration

Argument	Default	Description
`--dir`	`~/hero/var/index/defaulttest`	Base directory for all indexes
`--socket`	`~/hero/var/socket_heroindex`	Unix socket for RPC interface
`--http-port`	`9753`	HTTP server port (Web UI + API)
`--http-host`	`127.0.0.1`	HTTP server bind address

Server Interfaces

The server exposes two interfaces simultaneously:

Interface	Address	Protocol	Purpose
HTTP	`http://127.0.0.1:9753`	HTTP + JSON-RPC 2.0	Web UI, REST API, HTTP RPC
MCP	`http://127.0.0.1:9753/mcp`	MCP over HTTP	AI assistant tool integration
Unix Socket	`~/hero/var/socket_heroindex`	JSON-RPC 2.0	Programmatic client access

Web Admin UI

Open http://127.0.0.1:9753 in your browser to access:

Overview - Database info, schema viewer, document counts
Query - Execute search queries with results display
Documents - Add documents individually or in batches
API Docs - Full JSON-RPC 2.0 method reference with examples
Logs - Real-time operation log viewer
Perf Test - Load 100k documents and benchmark search performance

HTTP JSON-RPC Endpoint

All 18 RPC methods are available via HTTP POST:

# Health check
curl -X POST http://127.0.0.1:9753/rpc \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"server.ping","params":[],"id":1}'

# List databases
curl -X POST http://127.0.0.1:9753/rpc \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"db.list","params":[],"id":1}'

# Download OpenRPC spec
curl http://127.0.0.1:9753/openrpc.json

MCP (Model Context Protocol) Endpoint

HeroIndex exposes an MCP endpoint at POST /mcp on the same HTTP port (9753), allowing AI assistants (Claude, etc.) to use it as a tool server.

Supported MCP methods: initialize, tools/list, tools/call, ping

Available tools (16):

Tool	Description
`server_ping`	Health check
`server_stats`	Server uptime, database count, total docs
`db_list`	List all databases
`db_create`	Create database with schema
`db_delete`	Delete a database
`db_close`	Close database (free memory)
`db_select`	Select database for operations
`db_info`	Info about selected database
`schema_get`	Get schema of selected database
`doc_add`	Add a single document
`doc_add_batch`	Add documents in batch
`doc_delete`	Delete documents by field/value
`index_commit`	Commit pending changes
`index_reload`	Reload index reader
`search_query`	Execute a search query
`search_count`	Count matching documents

Example MCP usage:

# Initialize MCP session
curl -X POST http://127.0.0.1:9753/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"initialize","params":{},"id":1}'

# List available tools
curl -X POST http://127.0.0.1:9753/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","params":{},"id":2}'

# Call a tool (list databases)
curl -X POST http://127.0.0.1:9753/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"db_list","arguments":{}},"id":3}'

# Search via MCP
curl -X POST http://127.0.0.1:9753/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"search_query","arguments":{"query":{"type":"match","field":"body","value":"search"},"limit":5}},"id":4}'

MCP client configuration (e.g. for Claude Desktop claude_desktop_config.json):

{
  "mcpServers": {
    "heroindex": {
      "url": "http://127.0.0.1:9753/mcp"
    }
  }
}

Custom Arguments

Override defaults if needed:

heroindex --dir /custom/data --socket /tmp/search.sock --http-port 8080 --http-host 0.0.0.0

Using the Client Library

Option 1: Use the Installed Binary

After make install, the heroindex_test utility is available:

heroindex_test

Option 2: As a Rust Library

Add to your Cargo.toml:

[dependencies]
heroindex_client = { git = "https://forge.ourworld.tf/lhumina_research/hero_index_server.git" }
serde_json = "1.0"
tokio = { version = "1.0", features = ["full"] }

Then update dependencies:

cargo update

Or use a specific version from crates.io:

[dependencies]
heroindex_client = "0.1"
serde_json = "1.0"
tokio = { version = "1.0", features = ["full"] }

Quick Start

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // 1. Connect to the server (uses default socket path)
    let mut client = HeroIndexClient::connect("~/hero/var/socket_heroindex").await?;

    // 2. Create a database with schema
    client.db_create("articles", json!({
        "fields": [
            {"name": "title", "type": "text", "stored": true, "indexed": true},
            {"name": "body", "type": "text", "stored": true, "indexed": true}
        ]
    })).await?;

    // 3. Select the database
    client.db_select("articles").await?;

    // 4. Add a document
    client.doc_add(json!({
        "title": "Hello World",
        "body": "Rust is awesome"
    })).await?;

    // 5. Commit and search
    client.commit().await?;
    client.reload().await?;

    let results = client.search(
        json!({"type": "match", "field": "body", "value": "rust"}),
        10, 0
    ).await?;

    println!("Found {} results", results.total_hits);
    Ok(())
}

Example 1: Simple Full-Text Search

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = HeroIndexClient::connect("~/hero/var/socket_heroindex").await?;

    client.db_create("docs", json!({
        "fields": [{"name": "content", "type": "text", "stored": true, "indexed": true}]
    })).await?;

    client.db_select("docs").await?;

    // Add documents
    client.doc_add(json!({"content": "Rust programming language"})).await?;
    client.doc_add(json!({"content": "Python for data science"})).await?;

    client.commit().await?;
    client.reload().await?;

    // Search
    let results = client.search(
        json!({"type": "match", "field": "content", "value": "rust"}),
        10, 0
    ).await?;

    for hit in results.hits {
        println!("{:?}", hit.doc);
    }

    Ok(())
}

Example 2: Batch Insert & Range Query

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = HeroIndexClient::connect("~/hero/var/socket_heroindex").await?;

    client.db_create("products", json!({
        "fields": [
            {"name": "name", "type": "text", "stored": true, "indexed": true},
            {"name": "price", "type": "u64", "stored": true, "indexed": true, "fast": true}
        ]
    })).await?;

    client.db_select("products").await?;

    // Batch insert 1000 products
    let docs: Vec<_> = (1..=1000)
        .map(|i| json!({"name": format!("Product {}", i), "price": 10 + i as u64}))
        .collect();

    client.doc_add_batch(docs).await?;
    client.commit().await?;
    client.reload().await?;

    // Find products in price range $50-$100
    let results = client.search(
        json!({"type": "range", "field": "price", "gte": 50, "lt": 100}),
        100, 0
    ).await?;

    println!("Found {} products in price range", results.total_hits);
    Ok(())
}

Example 3: Fuzzy Search (Typo Tolerance)

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = HeroIndexClient::connect("~/hero/var/socket_heroindex").await?;

    client.db_create("words", json!({
        "fields": [{"name": "word", "type": "text", "stored": true, "indexed": true}]
    })).await?;

    client.db_select("words").await?;

    client.doc_add(json!({"word": "programming"})).await?;
    client.doc_add(json!({"word": "elephant"})).await?;

    client.commit().await?;
    client.reload().await?;

    // Find "programing" (typo) - will match "programming"
    let results = client.search(
        json!({"type": "fuzzy", "field": "word", "value": "programing", "distance": 1}),
        10, 0
    ).await?;

    println!("Found matches for typo: {:?}", results.hits);
    Ok(())
}

Example 4: Boolean Queries (AND, OR, NOT)

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = HeroIndexClient::connect("~/hero/var/socket_heroindex").await?;

    client.db_create("news", json!({
        "fields": [
            {"name": "title", "type": "text", "stored": true, "indexed": true},
            {"name": "category", "type": "str", "stored": true, "indexed": true}
        ]
    })).await?;

    client.db_select("news").await?;

    client.doc_add(json!({"title": "Rust wins award", "category": "tech"})).await?;
    client.doc_add(json!({"title": "Python ecosystem grows", "category": "tech"})).await?;
    client.doc_add(json!({"title": "Rust racing circuit", "category": "sports"})).await?;

    client.commit().await?;
    client.reload().await?;

    // Find: must have "Rust" AND must NOT be "sports"
    let results = client.search(
        json!({
            "type": "boolean",
            "must": [{"type": "match", "field": "title", "value": "rust"}],
            "must_not": [{"type": "term", "field": "category", "value": "sports"}]
        }),
        10, 0
    ).await?;

    println!("Found {} relevant articles", results.total_hits);
    Ok(())
}

Example 5: Multiple Databases

use heroindex_client::HeroIndexClient;
use serde_json::json;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut client = HeroIndexClient::connect("~/hero/var/socket_heroindex").await?;

    // Create separate databases for different content types
    let schema = json!({"fields": [
        {"name": "content", "type": "text", "stored": true, "indexed": true}
    ]});

    client.db_create("blog_posts", schema.clone()).await?;
    client.db_create("documentation", schema.clone()).await?;
    client.db_create("comments", schema).await?;

    // Work with blog posts
    client.db_select("blog_posts").await?;
    client.doc_add(json!({"content": "My first blog post"})).await?;

    // Switch to documentation
    client.db_select("documentation").await?;
    client.doc_add(json!({"content": "API Reference"})).await?;

    client.commit().await?;

    // List all databases
    let dbs = client.db_list().await?;
    println!("Databases: {}", dbs.databases.len());

    Ok(())
}

Performance

HeroIndex is blazingly fast! Run the performance benchmark:

make perftest

Benchmark Results

Operation	Time	Throughput
Single document insert	0.2ms	—
Batch insert (1000 docs)	7ms	137,000 docs/sec
Simple search	0.5ms	—
Paginated search (100 results)	1.5ms	—
Range query	1.9ms	—
Fuzzy search	0.5ms	—
Count query	0.04ms	—

API Reference

See docs/specs.md for the complete OpenRPC interface specification.

Query Types

Type	Description	Example
`all`	Match all documents	`{"type": "all"}`
`match`	Full-text match	`{"type": "match", "field": "body", "value": "search terms"}`
`term`	Exact term match	`{"type": "term", "field": "id", "value": "abc123"}`
`fuzzy`	Fuzzy matching	`{"type": "fuzzy", "field": "title", "value": "serch", "distance": 1}`
`phrase`	Exact phrase	`{"type": "phrase", "field": "body", "value": "exact phrase"}`
`prefix`	Prefix matching	`{"type": "prefix", "field": "title", "value": "hel"}`
`range`	Numeric/date range	`{"type": "range", "field": "price", "gte": 10, "lt": 100}`
`regex`	Regex pattern	`{"type": "regex", "field": "title", "value": "test.*"}`
`boolean`	Combine queries	`{"type": "boolean", "must": [...], "should": [...], "must_not": [...]}`

Field Types

Type	Description
`text`	Full-text searchable string (tokenized)
`str`	Exact string (keyword, not tokenized)
`u64`	Unsigned 64-bit integer
`i64`	Signed 64-bit integer
`f64`	64-bit floating point
`date`	DateTime (RFC 3339 format)
`bool`	Boolean
`json`	JSON object
`bytes`	Binary data
`ip`	IP address

Project Structure

hero_index_server/
├── Cargo.toml              # Workspace configuration
├── Makefile                # Build automation
├── buildenv.sh             # Build environment variables
├── scripts/
│   └── build_lib.sh        # Build library & utilities
├── README.md
├── VERSION                 # Version file
├── docs/
│   ├── specs.md            # OpenRPC interface specification
│   └── OPENRPC.md          # Full API method documentation
├── .forgejo/workflows/     # CI/CD pipelines (Linux & macOS)
├── heroindex/              # Server package
│   ├── Cargo.toml
│   ├── src/
│   │   ├── main.rs         # Entry point, HTTP + socket servers, demo DB
│   │   ├── error.rs        # Error types
│   │   ├── logging.rs      # In-memory operation log store
│   │   ├── mcp.rs          # MCP (Model Context Protocol) server
│   │   ├── web/            # HTTP server (Axum)
│   │   │   ├── mod.rs
│   │   │   ├── handlers.rs # HTTP routes, RPC endpoint, OpenRPC spec
│   │   │   └── state.rs    # Shared AppState
│   │   └── modules/
│   │       ├── mod.rs
│   │       ├── index_manager.rs
│   │       ├── schema.rs
│   │       ├── query.rs
│   │       ├── rpc.rs
│   │       └── handlers.rs # RPC method handlers (18 methods)
│   ├── templates/          # Askama HTML templates (Web UI)
│   │   ├── base.html       # Layout: Bootstrap 5.3 dark theme, navbar
│   │   └── index.html      # Dashboard: tabs, forms, perf test
│   └── tests/
│       └── integration.rs  # Integration tests + performance benchmark
└── heroindex_client/       # Client library package
    ├── Cargo.toml
    └── src/
        ├── lib.rs
        ├── client.rs
        ├── error.rs
        └── types.rs

Using with Claude via MCP (Local Machine Integration)

HeroIndex can be integrated with Claude via the Model Context Protocol (MCP).

Important: MCP servers are configured locally on your machine in ~/.claude/mcp.json. They:

✅ Work everywhere on this machine (web UI, Claude Code, API, etc.)
❌ Don't sync to other machines or cloud
✅ Persist across all sessions on this machine
✅ Available to all Claude clients on this machine

Quick Start: Add MCP Servers to Claude

Add HeroIndex to Claude (Global)

claude mcp add --transport http heroindex http://localhost:9753/mcp

Add Other MCP Servers (Global Examples)

# Sentry - Error tracking and monitoring
claude mcp add --transport http sentry https://mcp.sentry.dev/mcp

# Filesystem - Access local files
claude mcp add --transport stdio filesystem file:///path/to/directory

# GitHub - Repository management
claude mcp add --transport stdio github file:///path/to/github/tool

What This Does (Local to Your Machine)

When you run claude mcp add, Claude:

✅ Registers the server locally in ~/.claude/mcp.json (on this machine only)
✅ Makes it available everywhere on this machine - web UI, Claude Code, API
✅ Persists across sessions - no need to reconfigure
✅ Works with all Claude clients on this machine (but not synced to other machines)
❌ Does not sync to other computers or cloud

HeroIndex MCP Capabilities

With HeroIndex as an MCP server, Claude can:

✅ Create databases - Define schemas with multiple field types
✅ Search documents - Full-text, fuzzy, boolean, range queries
✅ Manage indexes - List, select, delete databases
✅ Monitor performance - View statistics and benchmark results
✅ Batch operations - Insert multiple documents at once
✅ Analyze data - Process search results with AI

MCP Server Configuration Files (Local to Your Machine)

MCP servers are stored in your local Claude configuration directory (not synced to cloud):

Platform	Location	Scope
macOS/Linux	`~/.claude/mcp.json`	This machine only
Windows	`%APPDATA%\Claude\mcp.json`	This machine only

Note: Configuration is local to this machine. Each machine where you use Claude needs its own MCP server setup.

Example: Full MCP Configuration

{
  "mcp_servers": {
    "heroindex": {
      "transport": "http",
      "url": "http://localhost:9753/mcp"
    },
    "sentry": {
      "transport": "http",
      "url": "https://mcp.sentry.dev/mcp"
    },
    "filesystem": {
      "transport": "stdio",
      "command": "filesystem",
      "args": ["/path/to/directory"]
    }
  }
}

Getting Started with HeroIndex MCP (On This Machine)

Start HeroIndex server (on this machine):

make run
# Listens on http://localhost:9753 (local machine only)

Add to Claude (local to this machine):

claude mcp add --transport http heroindex http://localhost:9753/mcp
# Stored in ~/.claude/mcp.json on this machine

Use in Claude (on this machine):
- Web UI: Start a new conversation, HeroIndex is available
- Claude Code: Use in your terminal with /mcp commands
- API: Access via Claude API with MCP context

Manage servers (local to this machine):

# List all configured MCP servers
claude mcp list

# Remove a server
claude mcp remove heroindex

# Update server configuration
claude mcp add --transport http heroindex http://localhost:9754/mcp

On other machines:
- Repeat steps 1-2 on each machine where you want to use HeroIndex
- Configuration doesn't sync automatically

Natural Language Examples

Once configured, you can ask Claude:

"Create a search index for documents with title, body, and date fields"
"Search for articles about machine learning from the last month"
"Show me the statistics for all databases"
"Find fuzzy matches for 'algoritm' in the documents"
"Run a performance benchmark and analyze the results"

License

MIT