Skip to content

Conversation

@seanGSISG
Copy link

Problem

Fresh plugin installations fail with "operation was aborted" timeout errors because hooks wait for full worker initialization including MCP connection, which can take 10-30+ seconds on first run (downloading dependencies, creating database, etc.).

The 15-second hook timeout is insufficient for this initialization time.

Error seen by users:

Plugin hook "bun worker-service.cjs start" failed to start: The operation was aborted.
Plugin hook "node context-hook.js" failed to start: The operation was aborted.

Root Cause Analysis

The Windows stability fix (PR #378) changed hooks to use /api/readiness which waits for full initialization including MCP connection. However, hooks only need database and SearchManager to function - they don't use MCP at all.

Initialization Timeline

Line 664: dbManager.initialize()        ← Database ready
Line 688: SearchManager routes setup    ← Hooks can work here (~3s)
─────────────────────────────────────────────────────────────────
Line 707: MCP connection complete       ← Full ready (10-30s on fresh install)
Line 710: initializationCompleteFlag    ← What /api/readiness waited for

The coupling meant MCP initialization (the slow part) blocked hooks that don't need it.

Solution: Two-Stage Readiness

Introduce staged initialization to separate what hooks need from full readiness:

Endpoint Returns 200 When Used By
/api/health Server listening waitForHealth() (start command)
/api/core-ready Database + SearchManager ready isWorkerHealthy() (hooks)
/api/readiness Full init including MCP Diagnostics, backward compat

Changes

src/services/worker-service.ts:

  • Add coreReady flag alongside existing mcpReady and initializationCompleteFlag
  • Add /api/core-ready endpoint that returns 200 when database+SearchManager ready
  • Update waitForHealth() to use /api/health (server listening check only)
  • Set coreReady=true after SearchManager initialization, before MCP connection

src/shared/worker-utils.ts:

  • Update isWorkerHealthy() to use /api/core-ready instead of /api/readiness
  • Hooks now proceed as soon as core services are ready

plugin/hooks/hooks.json:

  • Increase worker-service timeout from 15s to 45s as safety margin

Benefits

  • ✅ Fresh installs work without timeout errors
  • ✅ Hooks proceed as soon as database+SearchManager are ready (~3s)
  • ✅ MCP connection continues in background without blocking hooks
  • ✅ MCP failures don't break hook functionality
  • ✅ Backward compatible - /api/readiness unchanged for diagnostics/tooling

Test Plan

  • Fresh install simulation: rm -rf ~/.claude-mem + new Claude session
  • Verify no "operation was aborted" errors
  • Verify context injection works on first prompt
  • Verify worker health endpoint shows staged readiness
  • Verify no regression for users with running worker

## Problem

Fresh plugin installations fail with "operation was aborted" timeout errors
because hooks wait for full worker initialization including MCP connection,
which can take 10-30+ seconds on first run (downloading dependencies, etc.).

The 15-second hook timeout is insufficient for this initialization time.

## Root Cause

The Windows stability fix (PR thedotmack#378) changed hooks to use `/api/readiness`
which waits for full initialization including MCP. However, hooks only need
database and SearchManager to function - they don't use MCP at all.

This created a coupling where MCP initialization (the slow part) blocks
hooks that don't need it.

## Solution: Two-Stage Readiness

Introduce staged initialization to separate what hooks need from full readiness:

1. **`coreReady` flag** - Set after database and SearchManager are initialized
2. **`/api/core-ready` endpoint** - Returns 200 when core services are ready
3. **`/api/health`** - Now includes `coreReady` field for visibility
4. **`/api/readiness`** - Unchanged, still waits for full init (backward compat)

### Changes

**worker-service.ts:**
- Add `coreReady` flag alongside existing `mcpReady` and `initializationCompleteFlag`
- Add `/api/core-ready` endpoint that returns 200 when database+SearchManager ready
- Update `waitForHealth()` to use `/api/health` (server listening check only)
- Set `coreReady=true` after SearchManager initialization, before MCP connection

**worker-utils.ts:**
- Update `isWorkerHealthy()` to use `/api/core-ready` instead of `/api/readiness`
- Hooks now proceed as soon as core services are ready

**hooks.json:**
- Increase worker-service timeout from 15s to 45s as safety margin

## Benefits

- Fresh installs work without timeout errors
- Hooks proceed as soon as database+SearchManager are ready (~3s)
- MCP connection continues in background without blocking hooks
- MCP failures don't break hook functionality
- Backward compatible - `/api/readiness` unchanged for diagnostics/tooling

## Testing

1. Simulate fresh install: `rm -rf ~/.claude-mem`
2. Start new Claude session
3. Verify no "operation was aborted" errors
4. Verify context injection works on first prompt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant