fix: implement two-stage readiness to prevent fresh install timeout #521

seanGSISG · 2026-01-02T02:47:42Z

Problem

Fresh plugin installations fail with "operation was aborted" timeout errors because hooks wait for full worker initialization including MCP connection, which can take 10-30+ seconds on first run (downloading dependencies, creating database, etc.).

The 15-second hook timeout is insufficient for this initialization time.

Error seen by users:

Plugin hook "bun worker-service.cjs start" failed to start: The operation was aborted.
Plugin hook "node context-hook.js" failed to start: The operation was aborted.

Root Cause Analysis

The Windows stability fix (PR #378) changed hooks to use /api/readiness which waits for full initialization including MCP connection. However, hooks only need database and SearchManager to function - they don't use MCP at all.

Initialization Timeline

Line 664: dbManager.initialize()        ← Database ready
Line 688: SearchManager routes setup    ← Hooks can work here (~3s)
─────────────────────────────────────────────────────────────────
Line 707: MCP connection complete       ← Full ready (10-30s on fresh install)
Line 710: initializationCompleteFlag    ← What /api/readiness waited for

The coupling meant MCP initialization (the slow part) blocked hooks that don't need it.

Solution: Two-Stage Readiness

Introduce staged initialization to separate what hooks need from full readiness:

Endpoint	Returns 200 When	Used By
`/api/health`	Server listening	`waitForHealth()` (start command)
`/api/core-ready`	Database + SearchManager ready	`isWorkerHealthy()` (hooks)
`/api/readiness`	Full init including MCP	Diagnostics, backward compat

Changes

src/services/worker-service.ts:

Add coreReady flag alongside existing mcpReady and initializationCompleteFlag
Add /api/core-ready endpoint that returns 200 when database+SearchManager ready
Update waitForHealth() to use /api/health (server listening check only)
Set coreReady=true after SearchManager initialization, before MCP connection

src/shared/worker-utils.ts:

Update isWorkerHealthy() to use /api/core-ready instead of /api/readiness
Hooks now proceed as soon as core services are ready

plugin/hooks/hooks.json:

Increase worker-service timeout from 15s to 45s as safety margin

Benefits

✅ Fresh installs work without timeout errors
✅ Hooks proceed as soon as database+SearchManager are ready (~3s)
✅ MCP connection continues in background without blocking hooks
✅ MCP failures don't break hook functionality
✅ Backward compatible - /api/readiness unchanged for diagnostics/tooling

Test Plan

Fresh install simulation: rm -rf ~/.claude-mem + new Claude session
Verify no "operation was aborted" errors
Verify context injection works on first prompt
Verify worker health endpoint shows staged readiness
Verify no regression for users with running worker

## Problem Fresh plugin installations fail with "operation was aborted" timeout errors because hooks wait for full worker initialization including MCP connection, which can take 10-30+ seconds on first run (downloading dependencies, etc.). The 15-second hook timeout is insufficient for this initialization time. ## Root Cause The Windows stability fix (PR thedotmack#378) changed hooks to use `/api/readiness` which waits for full initialization including MCP. However, hooks only need database and SearchManager to function - they don't use MCP at all. This created a coupling where MCP initialization (the slow part) blocks hooks that don't need it. ## Solution: Two-Stage Readiness Introduce staged initialization to separate what hooks need from full readiness: 1. **`coreReady` flag** - Set after database and SearchManager are initialized 2. **`/api/core-ready` endpoint** - Returns 200 when core services are ready 3. **`/api/health`** - Now includes `coreReady` field for visibility 4. **`/api/readiness`** - Unchanged, still waits for full init (backward compat) ### Changes **worker-service.ts:** - Add `coreReady` flag alongside existing `mcpReady` and `initializationCompleteFlag` - Add `/api/core-ready` endpoint that returns 200 when database+SearchManager ready - Update `waitForHealth()` to use `/api/health` (server listening check only) - Set `coreReady=true` after SearchManager initialization, before MCP connection **worker-utils.ts:** - Update `isWorkerHealthy()` to use `/api/core-ready` instead of `/api/readiness` - Hooks now proceed as soon as core services are ready **hooks.json:** - Increase worker-service timeout from 15s to 45s as safety margin ## Benefits - Fresh installs work without timeout errors - Hooks proceed as soon as database+SearchManager are ready (~3s) - MCP connection continues in background without blocking hooks - MCP failures don't break hook functionality - Backward compatible - `/api/readiness` unchanged for diagnostics/tooling ## Testing 1. Simulate fresh install: `rm -rf ~/.claude-mem` 2. Start new Claude session 3. Verify no "operation was aborted" errors 4. Verify context injection works on first prompt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: implement two-stage readiness to prevent fresh install timeout #521

fix: implement two-stage readiness to prevent fresh install timeout #521

Uh oh!

seanGSISG commented Jan 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

fix: implement two-stage readiness to prevent fresh install timeout #521

Are you sure you want to change the base?

fix: implement two-stage readiness to prevent fresh install timeout #521

Uh oh!

Conversation

seanGSISG commented Jan 2, 2026

Problem

Root Cause Analysis

Initialization Timeline

Solution: Two-Stage Readiness

Changes

Benefits

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant