Skip to content

chasesaurabh/mcp-page-capture

mcp-page-capture

npm version GitHub stars License: MIT Node.js MCP Server

mcp-page-capture logo placeholder

mcp-page-capture is a Model Context Protocol (MCP) server that orchestrates headless Chromium via Puppeteer to capture pixel-perfect screenshots of arbitrary URLs. It is optimized for Copilot/MCP-enabled environments and can be embedded into automated workflows or run as a standalone developer tool.

Features

  • 📸 High-fidelity screenshots powered by Puppeteer and headless Chromium
  • ⚙️ LLM-optimized schema with minimal parameters exposed and sensible defaults
  • 🔍 Structured DOM extraction with optional CSS selectors for AI-friendly consumption
  • 📱 Device presets for mobile emulation (iPhone, iPad, Android, desktop)
  • 🎯 6 simplified steps for LLM friendliness: viewport, wait, fill, click, scroll, screenshot
  • 🤖 Smart defaults - screenshot auto-captured, field types auto-detected
  • 🆕 Consistent parameters - target for elements, for for waiting, to for scrolling, device for viewport
  • 🔄 Automatic retry with exponential backoff for transient failures
  • 📊 Telemetry hooks for centralized observability and monitoring
  • 💾 Pluggable storage backends (local filesystem, S3, memory)
  • 🛡️ Structured logging plus defensive error handling for operational visibility
  • 🔌 Launch via npm start, npm run dev, or as a long-lived MCP sidecar
  • 🐳 Docker image with multi-platform support (amd64, arm64)

How It Works

  1. The MCP transport boots a Node.js server and registers the captureScreenshot and extractDom tools.
  2. Incoming tool invocations are validated against the relevant type definitions.
  3. Puppeteer starts (or reuses) a Chromium instance, navigates to the requested URL, and either captures a screenshot or serializes the DOM according to the tool parameters.
  4. The server returns structured content (images, text, DOM trees, metadata) to the caller or downstream workflow.

Requirements

  • Node.js ≥ 18.x
  • npm ≥ 9.x
  • Chromium package download permissions (first run)
  • Network access to the target URLs

🌟 LLM-Friendly Features

Simplified Schema (5-Star LLM Rating)

  • ✅ Only 4 top-level parameters: url, steps, headers, validate
  • ✅ Exactly 6 step types (no more, no less)
  • ✅ Consistent parameter naming across all steps
  • ✅ Automatic screenshot if omitted
  • ✅ Smart field type detection
  • ✅ Actionable error messages with recovery suggestions
  • ✅ Single source of truth for all schemas
  • ✅ Deprecation warnings for legacy parameters
  • NEW: Validate mode for pre-flight step checking
  • NEW: Step order enforcement with auto-correction
  • NEW: Embedded LLM reference in MCP server instructions

Quick Start for LLMs

See LLM Quick Reference for the 6 primary step types and common patterns.

For advanced features, see Advanced Steps.

The 6 Step Types (Exactly 6, No More)

Step Purpose Key Parameters Example
viewport Set device device, width, height { "type": "viewport", "device": "mobile" }
wait Wait for element/time for OR duration, timeout { "type": "wait", "for": ".loaded" }
fill Fill form field target, value, submit { "type": "fill", "target": "#email", "value": "a@b.com" }
click Click element target, waitFor { "type": "click", "target": "button", "waitFor": ".result" }
scroll Scroll page to, y { "type": "scroll", "to": "#footer" }
screenshot Capture (auto-added) fullPage, element { "type": "screenshot", "fullPage": true }

Step Order: Auto-fixed! viewport auto-moves to first, screenshot auto-added at end.

Composite Patterns (NEW)

High-level patterns that auto-expand to multiple steps:

// Login pattern
{
  "type": "login",
  "email": { "selector": "#email", "value": "user@example.com" },
  "password": { "selector": "#password", "value": "secret" },
  "submit": "button[type=submit]",
  "successIndicator": ".dashboard"
}

// Search pattern  
{
  "type": "search",
  "input": "#search-box",
  "query": "MCP protocol",
  "resultsIndicator": ".search-results"
}

Installation

From source (recommended during development)

git clone https://github.com/chasesaurabh/mcp-page-capture.git
npm install

From npm

# Run without installing globally
npx mcp-page-capture

# Or add it to your toolchain
npm install -g mcp-page-capture
mcp-page-capture

Running the server

npm install
npm run build
npm start

For hot reload while iterating locally, run npm run dev.

Why Docker?

  • Guarantees a consistent Puppeteer + Chromium environment with all system libraries when teammates or CI run the server. No more "it works on my machine" mismatches.
  • Provides a ready-to-deploy container image for hosting mcp-page-capture as a sidecar/service on Kubernetes, ECS, Fly.io, etc.

If you need those guarantees, build and run via:

docker build -t mcp-page-capture .
docker run --rm -it mcp-page-capture

Otherwise you can keep using the standard npm scripts locally.

IDE MCP configuration

{
  "mcpServers": {
    "page-capture": {
      "command": "node",
      "args": ["dist/cli.js"]
    }
  }
}

Note: You may need to use the full path to dist/cli.js or node depending on your working directory and Node.js module resolution configuration.

Programmatic usage

If you want to embed the server inside another Node.js process, import the helpers exposed by the package:

import { startMcpPageCaptureServer } from "mcp-page-capture";

await startMcpPageCaptureServer();
// Optionally pass a custom Transport implementation if you don't want stdio.

Usage

Tool invocation examples

Basic Screenshot

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com"
  }
}

Note: A screenshot is automatically captured at the end if no explicit screenshot step is provided.

Search with Fill Step (Recommended)

The fill step auto-detects field types and handles text inputs, selects, checkboxes, and radio buttons.

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com",
    "steps": [
      { "type": "fill", "target": "#search", "value": "MCP protocol", "submit": true },
      { "type": "wait", "for": ".search-results" },
      { "type": "screenshot" }
    ]
  }
}

Login Form Example

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com/login",
    "steps": [
      { "type": "wait", "for": "#login-form" },
      { "type": "fill", "target": "#email", "value": "user@example.com" },
      { "type": "fill", "target": "#password", "value": "secretpassword" },
      { "type": "fill", "target": "#remember-me", "value": "true" },
      { "type": "click", "target": "button[type=submit]", "waitFor": ".dashboard" },
      { "type": "screenshot" }
    ]
  }
}

Full Page Screenshot

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://docs.modelcontextprotocol.io",
    "steps": [
      { "type": "screenshot", "fullPage": true }
    ]
  }
}

With Authentication and Cookies

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com/dashboard",
    "headers": {
      "authorization": "Bearer dev-token"
    },
    "steps": [
      {
        "type": "cookie",
        "action": "set",
        "name": "session",
        "value": "abc123",
        "path": "/secure"
      },
      { "type": "screenshot" }
    ]
  }
}

Extract DOM Content

{
  "tool": "extractDom",
  "params": {
    "url": "https://docs.modelcontextprotocol.io",
    "selector": "main article"
  }
}

Mobile Device Emulation

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com",
    "steps": [
      { "type": "viewport", "device": "ipad-pro" },
      { "type": "scroll", "to": "#main-content" },
      { "type": "screenshot" }
    ]
  }
}

Step Types

6 Primary Steps (LLM-Exposed)

These are the only steps exposed to LLMs. They cover 95%+ of use cases:

Step Purpose Parameters
viewport Set device/screen size device, width, height
wait Wait for element/time for OR duration, timeout
fill Fill form field target, value, submit
click Click element target, waitFor
scroll Scroll page to (selector), y (pixels)
screenshot Capture (auto-added) fullPage, element

Validate Mode (NEW)

Use validate: true to check steps before execution:

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com",
    "steps": [
      { "type": "fill", "target": "#email", "value": "test@example.com" },
      { "type": "click", "target": "button" }
    ],
    "validate": true
  }
}

Returns validation analysis including:

  • Errors: Missing required parameters
  • Warnings: Step order issues (e.g., viewport not first)
  • Suggestions: Recommended improvements (e.g., add waitFor to click)
  • Step analysis: Per-step status and notes

Deprecated Steps (Legacy Support Only)

These work at runtime but are not exposed in the LLM schema. Use the 6 primary steps instead:

Deprecated Use Instead
quickFill fill with submit: true
fillForm Multiple fill steps
waitForSelector wait with for parameter
delay wait with duration parameter
fullPage screenshot with fullPage: true

Internal Steps (Not Exposed)

These are for power users only and are not documented in the tool schema:

type, hover, cookie, storage, evaluate, keypress, focus, blur, clear, upload, submit

Legacy Parameter Support

For backward compatibility, these parameters work at runtime but are not exposed in the LLM schema:

Legacy Parameter Canonical Notes
selector target Use target for element selectors
awaitElement for Use for in wait steps
scrollTo to Use to in scroll steps
preset device Use device in viewport steps
captureElement element Use element in screenshot steps
waitAfter wait Use wait in click steps

Deprecation warnings are logged when legacy parameters are used.

Example response

{
  "content": [
    {
      "type": "text",
      "text": "mcp-page-capture screenshot\nURL: https://example.com\nCaptured: 2025-12-13T08:30:12.713Z\nFull page: false\nViewport: 1280x720\nDocument: 1280x2000\nScroll position: (0, 0)\nSize: 45.2 KB\nSteps executed: 5"
    },
    {
      "type": "image",
      "mimeType": "image/png",
      "data": "iVBORw0KGgoAAAANSUhEUgA..."
    }
  ]
}

## Supported options

### `captureScreenshot`
- `url` (string, required): Fully-qualified URL to capture
- `headers` (object, optional): Key/value map of HTTP headers to send with the initial page navigation
- `cookies` (array, optional): List of cookies to set before navigation. Each cookie supports `name`, `value`, and optional `url`, `domain`, `path`, `secure`, `httpOnly`, `sameSite`, and `expires` (Unix timestamp, seconds)
- `viewport` (object, optional): Viewport configuration
  - `preset` (string, optional): Use a predefined viewport preset (see Viewport Presets section)
  - `width` (number, optional): Custom viewport width
  - `height` (number, optional): Custom viewport height
  - `deviceScaleFactor` (number, optional): Device scale factor (e.g., 2 for Retina)
  - `isMobile` (boolean, optional): Whether to emulate mobile device
  - `hasTouch` (boolean, optional): Whether to enable touch events
  - `userAgent` (string, optional): Custom user agent string
- `retryPolicy` (object, optional): Retry configuration for transient failures
  - `maxRetries` (number, optional, default 3): Maximum number of retry attempts
  - `initialDelayMs` (number, optional, default 1000): Initial delay between retries
  - `maxDelayMs` (number, optional, default 10000): Maximum delay between retries
  - `backoffMultiplier` (number, optional, default 2): Exponential backoff multiplier
- `storageTarget` (string, optional): Storage backend name for saving captures

### `extractDom`
- `url` (string, required): Fully-qualified URL to inspect
- `selector` (string, optional): CSS selector to scope extraction to a specific element. Defaults to the entire document
- `headers` (object, optional): Key/value map of HTTP headers sent before navigation
- `cookies` (array, optional): Same cookie structure as `captureScreenshot`, applied before navigation
- `viewport` (object, optional): Same viewport configuration as `captureScreenshot`
- `retryPolicy` (object, optional): Same retry configuration as `captureScreenshot`
- `storageTarget` (string, optional): Storage backend name for saving DOM data

## Action Steps for captureScreenshot

The `captureScreenshot` tool supports a comprehensive `steps` array that allows you to perform various web interactions before capturing the screenshot. Each step is executed in sequence, allowing for complex automation scenarios.

### Fill Form (`fillForm`) - Recommended for Form Interactions
The `fillForm` step is the easiest and most LLM-friendly way to interact with forms. It auto-detects field types and handles multiple fields in a single step.

```json
{
  "type": "fillForm",
  "fields": [
    { "selector": "#email", "value": "user@example.com" },
    { "selector": "#password", "value": "secretpassword" },
    { "selector": "#country", "value": "us" },
    { "selector": "#newsletter", "value": "true" },
    { "selector": "#plan", "value": "premium", "type": "radio" }
  ],
  "formSelector": "#signup-form",
  "submit": true,
  "submitSelector": "#submit-btn",
  "waitForNavigation": true
}

Field Configuration

Each field in the fields array supports:

  • selector (required): CSS selector for the form field
  • value (required): Value to set. For checkboxes use "true" or "false". For selects/radios use the value attribute.
  • type (optional): Field type hint (text, select, checkbox, radio, textarea, password, email, number, tel, url, date, file). Auto-detected if not specified.
  • matchByText (optional): For select fields, match by visible text instead of value attribute
  • delay (optional): Delay between keystrokes in ms (for text inputs)

Form Options

  • formSelector (optional): CSS selector for the form container (for scoping field selectors)
  • submit (optional): Whether to submit the form after filling (default: false)
  • submitSelector (optional): Selector for submit button. If not specified, uses form.submit() or looks for [type="submit"]
  • waitForNavigation (optional): Whether to wait for navigation after submit (default: true)

Text Input (text)

Type text into input fields:

{
  "type": "text",
  "selector": "#username",
  "value": "john.doe@example.com",
  "clearFirst": true,  // Clear existing text first (default: true)
  "delay": 100,  // Delay between keystrokes in ms (0-1000)
  "pressEnter": false  // Press Enter after typing (default: false)
}

Select Dropdown (select)

Select an option from a dropdown:

{
  "type": "select",
  "selector": "#country",
  "value": "us"  // OR "text": "United States" OR "index": 0
}

Radio Button (radio)

Select a radio button:

{
  "type": "radio",
  "selector": "input[type='radio']",
  "value": "option1",  // Value attribute of the radio button
  "name": "preference"  // Name attribute to identify the radio group
}

Checkbox (checkbox)

Check or uncheck a checkbox:

{
  "type": "checkbox",
  "selector": "#agree-terms",
  "checked": true  // true to check, false to uncheck
}

Click (click)

Click on elements:

{
  "type": "click",
  "target": "button.submit",
  "button": "left",  // "left", "right", or "middle" (default: left)
  "clickCount": 1,  // 1=single, 2=double, 3=triple (default: 1)
  "waitForNavigation": false,  // Wait for page navigation (default: false)
  "waitForSelector": ".modal-content"  // Wait for element to appear after click
}

Hover (hover)

Hover over elements:

{
  "type": "hover",
  "selector": ".dropdown-trigger",
  "duration": 1000  // How long to maintain hover in ms (0-10000)
}

File Upload (upload)

Upload files:

{
  "type": "upload",
  "selector": "input[type='file']",
  "filePaths": ["/path/to/file1.pdf", "/path/to/file2.jpg"]
}

Form Submit (submit)

Submit forms:

{
  "type": "submit",
  "selector": "#contact-form",  // Form element or submit button
  "waitForNavigation": true  // Wait for page navigation (default: true)
}

Scroll (scroll)

Scroll the page:

{
  "type": "scroll",
  "scrollTo": "#section-2",  // Scroll to element (takes precedence)
  "x": 0,  // OR horizontal scroll position in pixels
  "y": 500,  // OR vertical scroll position in pixels
  "behavior": "smooth"  // "auto" or "smooth" (default: auto)
}

Key Press (keypress)

Press keyboard keys:

{
  "type": "keypress",
  "key": "Enter",  // Key to press (e.g., "Enter", "Tab", "Escape", "ArrowDown")
  "modifiers": ["Control", "Shift"],  // Optional modifiers
  "selector": "#search-box"  // Optional element to focus first
}

Wait for Selector (waitForSelector) - DEPRECATED

Use wait step instead:

{ "type": "wait", "for": ".loading-complete", "timeout": 10000 }

Delay (delay) - DEPRECATED

Use wait step with duration instead:

{ "type": "wait", "duration": 2000 }

Focus (focus)

Focus an element:

{
  "type": "focus",
  "selector": "#search-input"
}

Blur (blur)

Blur (unfocus) an element:

{
  "type": "blur",
  "selector": "#search-input"
}

Clear (clear)

Clear input field contents:

{
  "type": "clear",
  "selector": "#search-input"
}

Evaluate (evaluate)

Execute custom JavaScript:

{
  "type": "evaluate",
  "script": "document.title = 'New Title'; return document.title;",
  "selector": "#element"  // Optional element to pass to the script
}

Screenshot (screenshot)

Capture screenshot at any point:

{
  "type": "screenshot",
  "fullPage": true,  // Capture entire page (optional)
  "captureElement": ".specific-element"  // Capture specific element (optional)
}

Cookie Management (cookie)

Set or delete browser cookies:

{
  "type": "cookie",
  "action": "set",
  "name": "session_id",
  "value": "abc123",
  "domain": ".example.com",
  "path": "/",
  "secure": true
}

Supported actions: set (add/update cookie), delete (remove cookie).

Storage Management (storage)

Manage localStorage/sessionStorage:

{
  "type": "storage",
  "storageType": "localStorage",
  "action": "set",
  "key": "user_preferences",
  "value": "{\"theme\":\"dark\"}"
}

Supported actions: set (add/update), delete (remove key), clear (remove all items).

Example: Complex Form Interaction with Screenshot

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com/signup",
    "steps": [
      { "type": "waitForSelector", "awaitElement": "#signup-form" },
      { "type": "text", "selector": "#email", "value": "user@example.com" },
      { "type": "text", "selector": "#password", "value": "SecurePass123!" },
      { "type": "select", "selector": "#country", "text": "United States" },
      { "type": "radio", "selector": "input[name='plan']", "value": "premium" },
      { "type": "checkbox", "selector": "#newsletter", "checked": true },
      { "type": "checkbox", "selector": "#terms", "checked": true },
      { "type": "hover", "selector": ".tooltip-trigger", "duration": 500 },
      { "type": "scroll", "y": 200 },
      { "type": "click", "target": "button[type='submit']", "waitForNavigation": true },
      { "type": "delay", "duration": 2000 },
      { "type": "screenshot", "fullPage": true }
    ]
  }
}

Troubleshooting

If your capture fails, use these guidelines:

Error Solution
"element not found" Check CSS selector, add waitForSelector before click/fillForm
"navigation timeout" Increase retryPolicy.maxRetries or add delay step
"page not loaded" Add waitForSelector or delay before screenshot
"click failed" Ensure element is visible, add scroll to bring it into view

Viewport Presets

The following viewport presets are available:

Desktop

  • desktop-fhd: 1920x1080 Full HD
  • desktop-hd: 1280x720 HD
  • desktop-4k: 3840x2160 4K
  • macbook-pro-16: MacBook Pro 16-inch Retina

Tablets

  • ipad-pro: iPad Pro 12.9-inch
  • ipad-pro-landscape: iPad Pro 12.9-inch (landscape)
  • ipad: iPad 10.2-inch
  • surface-pro: Microsoft Surface Pro

Mobile

  • iphone-14-pro-max: iPhone 14 Pro Max
  • iphone-14-pro: iPhone 14 Pro
  • iphone-se: iPhone SE (3rd generation)
  • pixel-7-pro: Google Pixel 7 Pro
  • galaxy-s23-ultra: Samsung Galaxy S23 Ultra

Example with viewport preset:

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com",
    "viewport": {
      "preset": "iphone-14-pro"
    }
  }
}

Retry Policy

The tools automatically retry on transient failures with exponential backoff. Default retryable conditions:

  • HTTP status codes: 500, 502, 503, 504, 408, 429
  • Network errors: ETIMEDOUT, ECONNRESET, ENOTFOUND, ECONNREFUSED
  • DNS resolution failures

Example with custom retry policy:

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com",
    "retryPolicy": {
      "maxRetries": 5,
      "initialDelayMs": 2000,
      "backoffMultiplier": 1.5
    }
  }
}

Telemetry

The server emits structured telemetry events that can be consumed for monitoring and observability:

Event Types

  • tool.invoked: Tool execution started
  • tool.completed: Tool execution succeeded
  • tool.failed: Tool execution failed
  • navigation.started: Page navigation initiated
  • navigation.completed: Page navigation succeeded
  • navigation.failed: Page navigation failed
  • retry.attempt: Retry attempt started
  • retry.succeeded: Retry succeeded
  • browser.launched: Puppeteer browser started
  • browser.closed: Puppeteer browser closed
  • screenshot.captured: Screenshot taken
  • dom.extracted: DOM content extracted

Configuring Telemetry

You can configure telemetry hooks programmatically:

import { getGlobalTelemetry } from "mcp-page-capture";

const telemetry = getGlobalTelemetry();

// Configure HTTP sink for centralized collection
telemetry.configureHttpSink({
  url: "https://telemetry.example.com/events",
  headers: { "X-API-Key": "your-api-key" },
  batchSize: 100,
  flushIntervalMs: 5000,
});

// Register custom hooks
telemetry.registerHook({
  name: "custom-logger",
  enabled: true,
  handler: async (event) => {
    console.log(`[${event.type}]`, event.data);
  },
});

Storage Backends

Captures can be automatically saved to configurable storage backends:

Local Filesystem

import { registerStorageTarget, LocalStorageTarget } from "mcp-page-capture";

const localStorage = new LocalStorageTarget("/path/to/captures");
registerStorageTarget("local", localStorage);

S3-Compatible Storage

import { registerStorageTarget, S3StorageTarget } from "mcp-page-capture";

const s3Storage = new S3StorageTarget({
  bucket: "my-captures",
  prefix: "screenshots/",
  region: "us-west-2",
});
registerStorageTarget("s3", s3Storage);

Memory Storage

import { registerStorageTarget, MemoryStorageTarget } from "mcp-page-capture";

const memoryStorage = new MemoryStorageTarget();
registerStorageTarget("memory", memoryStorage);

Then use the storage in tool invocations:

{
  "tool": "captureScreenshot",
  "params": {
    "url": "https://example.com",
    "storageTarget": "s3"
  }
}

Known limitations

  • Dynamic pages requiring complex authentication flows or user gestures are not yet automated
  • Extremely long or infinite-scroll pages may exceed default Chromium memory limits
  • S3 storage backend requires AWS SDK integration (placeholder implementation included)

Automated Releases & Distribution

npm Package

  • Release automation is powered by semantic-release and GitHub Actions
  • Commits must follow the Conventional Commits spec (feat:, fix:, chore:) for automatic versioning
  • Publishes to npm registry on successful builds from main branch

Docker Images

  • Multi-platform images (linux/amd64, linux/arm64) are automatically built and published
  • Images are pushed to:
    • Docker Hub: <username>/mcp-page-capture
    • GitHub Container Registry: ghcr.io/<org>/mcp-page-capture
  • Tagged with semantic version, major, major.minor, and latest

Required Repository Secrets

Configure these secrets in your GitHub repository settings:

  • NPM_TOKEN: npm access token with publish permissions
  • DOCKER_USERNAME: Docker Hub username
  • DOCKER_PASSWORD: Docker Hub access token

The GITHUB_TOKEN is provided automatically by GitHub Actions.

How to contribute

Read CONTRIBUTING.md, open an issue describing the change, and submit a PR that includes npm run build output plus updated docs/tests.

Author / Maintainer

Maintained by Saurabh Chase (@chasesaurabh). Reach out via issues or discussions for roadmap coordination.

License

Released under the MIT License.

About

Capture web page screenshots and return MCP-compatible metadata for automated workflows.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages