mcp-page-capture is a Model Context Protocol (MCP) server that orchestrates headless Chromium via Puppeteer to capture pixel-perfect screenshots of arbitrary URLs. It is optimized for Copilot/MCP-enabled environments and can be embedded into automated workflows or run as a standalone developer tool.
- 📸 High-fidelity screenshots powered by Puppeteer and headless Chromium
- ⚙️ LLM-optimized schema with minimal parameters exposed and sensible defaults
- 🔍 Structured DOM extraction with optional CSS selectors for AI-friendly consumption
- 📱 Device presets for mobile emulation (iPhone, iPad, Android, desktop)
- 🎯 6 simplified steps for LLM friendliness:
viewport,wait,fill,click,scroll,screenshot - 🤖 Smart defaults - screenshot auto-captured, field types auto-detected
- 🆕 Consistent parameters -
targetfor elements,forfor waiting,tofor scrolling,devicefor viewport - 🔄 Automatic retry with exponential backoff for transient failures
- 📊 Telemetry hooks for centralized observability and monitoring
- 💾 Pluggable storage backends (local filesystem, S3, memory)
- 🛡️ Structured logging plus defensive error handling for operational visibility
- 🔌 Launch via
npm start,npm run dev, or as a long-lived MCP sidecar - 🐳 Docker image with multi-platform support (amd64, arm64)
- The MCP transport boots a Node.js server and registers the
captureScreenshotandextractDomtools. - Incoming tool invocations are validated against the relevant type definitions.
- Puppeteer starts (or reuses) a Chromium instance, navigates to the requested URL, and either captures a screenshot or serializes the DOM according to the tool parameters.
- The server returns structured content (images, text, DOM trees, metadata) to the caller or downstream workflow.
- Node.js ≥ 18.x
- npm ≥ 9.x
- Chromium package download permissions (first run)
- Network access to the target URLs
- ✅ Only 4 top-level parameters:
url,steps,headers,validate - ✅ Exactly 6 step types (no more, no less)
- ✅ Consistent parameter naming across all steps
- ✅ Automatic screenshot if omitted
- ✅ Smart field type detection
- ✅ Actionable error messages with recovery suggestions
- ✅ Single source of truth for all schemas
- ✅ Deprecation warnings for legacy parameters
- ✅ NEW: Validate mode for pre-flight step checking
- ✅ NEW: Step order enforcement with auto-correction
- ✅ NEW: Embedded LLM reference in MCP server instructions
See LLM Quick Reference for the 6 primary step types and common patterns.
For advanced features, see Advanced Steps.
| Step | Purpose | Key Parameters | Example |
|---|---|---|---|
viewport |
Set device | device, width, height |
{ "type": "viewport", "device": "mobile" } |
wait |
Wait for element/time | for OR duration, timeout |
{ "type": "wait", "for": ".loaded" } |
fill |
Fill form field | target, value, submit |
{ "type": "fill", "target": "#email", "value": "a@b.com" } |
click |
Click element | target, waitFor |
{ "type": "click", "target": "button", "waitFor": ".result" } |
scroll |
Scroll page | to, y |
{ "type": "scroll", "to": "#footer" } |
screenshot |
Capture (auto-added) | fullPage, element |
{ "type": "screenshot", "fullPage": true } |
Step Order: Auto-fixed! viewport auto-moves to first, screenshot auto-added at end.
High-level patterns that auto-expand to multiple steps:
// Login pattern
{
"type": "login",
"email": { "selector": "#email", "value": "user@example.com" },
"password": { "selector": "#password", "value": "secret" },
"submit": "button[type=submit]",
"successIndicator": ".dashboard"
}
// Search pattern
{
"type": "search",
"input": "#search-box",
"query": "MCP protocol",
"resultsIndicator": ".search-results"
}git clone https://github.com/chasesaurabh/mcp-page-capture.git
npm install# Run without installing globally
npx mcp-page-capture
# Or add it to your toolchain
npm install -g mcp-page-capture
mcp-page-capturenpm install
npm run build
npm start
For hot reload while iterating locally, run npm run dev.
- Guarantees a consistent Puppeteer + Chromium environment with all system libraries when teammates or CI run the server. No more "it works on my machine" mismatches.
- Provides a ready-to-deploy container image for hosting mcp-page-capture as a sidecar/service on Kubernetes, ECS, Fly.io, etc.
If you need those guarantees, build and run via:
docker build -t mcp-page-capture .
docker run --rm -it mcp-page-captureOtherwise you can keep using the standard npm scripts locally.
{
"mcpServers": {
"page-capture": {
"command": "node",
"args": ["dist/cli.js"]
}
}
}Note: You may need to use the full path to dist/cli.js or node depending on your working directory and Node.js module resolution configuration.
If you want to embed the server inside another Node.js process, import the helpers exposed by the package:
import { startMcpPageCaptureServer } from "mcp-page-capture";
await startMcpPageCaptureServer();
// Optionally pass a custom Transport implementation if you don't want stdio.{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com"
}
}Note: A screenshot is automatically captured at the end if no explicit screenshot step is provided.
The fill step auto-detects field types and handles text inputs, selects, checkboxes, and radio buttons.
{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com",
"steps": [
{ "type": "fill", "target": "#search", "value": "MCP protocol", "submit": true },
{ "type": "wait", "for": ".search-results" },
{ "type": "screenshot" }
]
}
}{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com/login",
"steps": [
{ "type": "wait", "for": "#login-form" },
{ "type": "fill", "target": "#email", "value": "user@example.com" },
{ "type": "fill", "target": "#password", "value": "secretpassword" },
{ "type": "fill", "target": "#remember-me", "value": "true" },
{ "type": "click", "target": "button[type=submit]", "waitFor": ".dashboard" },
{ "type": "screenshot" }
]
}
}{
"tool": "captureScreenshot",
"params": {
"url": "https://docs.modelcontextprotocol.io",
"steps": [
{ "type": "screenshot", "fullPage": true }
]
}
}{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com/dashboard",
"headers": {
"authorization": "Bearer dev-token"
},
"steps": [
{
"type": "cookie",
"action": "set",
"name": "session",
"value": "abc123",
"path": "/secure"
},
{ "type": "screenshot" }
]
}
}{
"tool": "extractDom",
"params": {
"url": "https://docs.modelcontextprotocol.io",
"selector": "main article"
}
}{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com",
"steps": [
{ "type": "viewport", "device": "ipad-pro" },
{ "type": "scroll", "to": "#main-content" },
{ "type": "screenshot" }
]
}
}These are the only steps exposed to LLMs. They cover 95%+ of use cases:
| Step | Purpose | Parameters |
|---|---|---|
viewport |
Set device/screen size | device, width, height |
wait |
Wait for element/time | for OR duration, timeout |
fill |
Fill form field | target, value, submit |
click |
Click element | target, waitFor |
scroll |
Scroll page | to (selector), y (pixels) |
screenshot |
Capture (auto-added) | fullPage, element |
Use validate: true to check steps before execution:
{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com",
"steps": [
{ "type": "fill", "target": "#email", "value": "test@example.com" },
{ "type": "click", "target": "button" }
],
"validate": true
}
}Returns validation analysis including:
- Errors: Missing required parameters
- Warnings: Step order issues (e.g., viewport not first)
- Suggestions: Recommended improvements (e.g., add
waitForto click) - Step analysis: Per-step status and notes
These work at runtime but are not exposed in the LLM schema. Use the 6 primary steps instead:
| Deprecated | Use Instead |
|---|---|
quickFill |
fill with submit: true |
fillForm |
Multiple fill steps |
waitForSelector |
wait with for parameter |
delay |
wait with duration parameter |
fullPage |
screenshot with fullPage: true |
These are for power users only and are not documented in the tool schema:
type, hover, cookie, storage, evaluate, keypress, focus, blur, clear, upload, submit
For backward compatibility, these parameters work at runtime but are not exposed in the LLM schema:
| Legacy Parameter | Canonical | Notes |
|---|---|---|
selector |
target |
Use target for element selectors |
awaitElement |
for |
Use for in wait steps |
scrollTo |
to |
Use to in scroll steps |
preset |
device |
Use device in viewport steps |
captureElement |
element |
Use element in screenshot steps |
waitAfter |
wait |
Use wait in click steps |
Deprecation warnings are logged when legacy parameters are used.
{
"content": [
{
"type": "text",
"text": "mcp-page-capture screenshot\nURL: https://example.com\nCaptured: 2025-12-13T08:30:12.713Z\nFull page: false\nViewport: 1280x720\nDocument: 1280x2000\nScroll position: (0, 0)\nSize: 45.2 KB\nSteps executed: 5"
},
{
"type": "image",
"mimeType": "image/png",
"data": "iVBORw0KGgoAAAANSUhEUgA..."
}
]
}
## Supported options
### `captureScreenshot`
- `url` (string, required): Fully-qualified URL to capture
- `headers` (object, optional): Key/value map of HTTP headers to send with the initial page navigation
- `cookies` (array, optional): List of cookies to set before navigation. Each cookie supports `name`, `value`, and optional `url`, `domain`, `path`, `secure`, `httpOnly`, `sameSite`, and `expires` (Unix timestamp, seconds)
- `viewport` (object, optional): Viewport configuration
- `preset` (string, optional): Use a predefined viewport preset (see Viewport Presets section)
- `width` (number, optional): Custom viewport width
- `height` (number, optional): Custom viewport height
- `deviceScaleFactor` (number, optional): Device scale factor (e.g., 2 for Retina)
- `isMobile` (boolean, optional): Whether to emulate mobile device
- `hasTouch` (boolean, optional): Whether to enable touch events
- `userAgent` (string, optional): Custom user agent string
- `retryPolicy` (object, optional): Retry configuration for transient failures
- `maxRetries` (number, optional, default 3): Maximum number of retry attempts
- `initialDelayMs` (number, optional, default 1000): Initial delay between retries
- `maxDelayMs` (number, optional, default 10000): Maximum delay between retries
- `backoffMultiplier` (number, optional, default 2): Exponential backoff multiplier
- `storageTarget` (string, optional): Storage backend name for saving captures
### `extractDom`
- `url` (string, required): Fully-qualified URL to inspect
- `selector` (string, optional): CSS selector to scope extraction to a specific element. Defaults to the entire document
- `headers` (object, optional): Key/value map of HTTP headers sent before navigation
- `cookies` (array, optional): Same cookie structure as `captureScreenshot`, applied before navigation
- `viewport` (object, optional): Same viewport configuration as `captureScreenshot`
- `retryPolicy` (object, optional): Same retry configuration as `captureScreenshot`
- `storageTarget` (string, optional): Storage backend name for saving DOM data
## Action Steps for captureScreenshot
The `captureScreenshot` tool supports a comprehensive `steps` array that allows you to perform various web interactions before capturing the screenshot. Each step is executed in sequence, allowing for complex automation scenarios.
### Fill Form (`fillForm`) - Recommended for Form Interactions
The `fillForm` step is the easiest and most LLM-friendly way to interact with forms. It auto-detects field types and handles multiple fields in a single step.
```json
{
"type": "fillForm",
"fields": [
{ "selector": "#email", "value": "user@example.com" },
{ "selector": "#password", "value": "secretpassword" },
{ "selector": "#country", "value": "us" },
{ "selector": "#newsletter", "value": "true" },
{ "selector": "#plan", "value": "premium", "type": "radio" }
],
"formSelector": "#signup-form",
"submit": true,
"submitSelector": "#submit-btn",
"waitForNavigation": true
}Each field in the fields array supports:
selector(required): CSS selector for the form fieldvalue(required): Value to set. For checkboxes use"true"or"false". For selects/radios use the value attribute.type(optional): Field type hint (text,select,checkbox,radio,textarea,password,email,number,tel,url,date,file). Auto-detected if not specified.matchByText(optional): For select fields, match by visible text instead of value attributedelay(optional): Delay between keystrokes in ms (for text inputs)
formSelector(optional): CSS selector for the form container (for scoping field selectors)submit(optional): Whether to submit the form after filling (default: false)submitSelector(optional): Selector for submit button. If not specified, uses form.submit() or looks for[type="submit"]waitForNavigation(optional): Whether to wait for navigation after submit (default: true)
Type text into input fields:
{
"type": "text",
"selector": "#username",
"value": "john.doe@example.com",
"clearFirst": true, // Clear existing text first (default: true)
"delay": 100, // Delay between keystrokes in ms (0-1000)
"pressEnter": false // Press Enter after typing (default: false)
}Select an option from a dropdown:
{
"type": "select",
"selector": "#country",
"value": "us" // OR "text": "United States" OR "index": 0
}Select a radio button:
{
"type": "radio",
"selector": "input[type='radio']",
"value": "option1", // Value attribute of the radio button
"name": "preference" // Name attribute to identify the radio group
}Check or uncheck a checkbox:
{
"type": "checkbox",
"selector": "#agree-terms",
"checked": true // true to check, false to uncheck
}Click on elements:
{
"type": "click",
"target": "button.submit",
"button": "left", // "left", "right", or "middle" (default: left)
"clickCount": 1, // 1=single, 2=double, 3=triple (default: 1)
"waitForNavigation": false, // Wait for page navigation (default: false)
"waitForSelector": ".modal-content" // Wait for element to appear after click
}Hover over elements:
{
"type": "hover",
"selector": ".dropdown-trigger",
"duration": 1000 // How long to maintain hover in ms (0-10000)
}Upload files:
{
"type": "upload",
"selector": "input[type='file']",
"filePaths": ["/path/to/file1.pdf", "/path/to/file2.jpg"]
}Submit forms:
{
"type": "submit",
"selector": "#contact-form", // Form element or submit button
"waitForNavigation": true // Wait for page navigation (default: true)
}Scroll the page:
{
"type": "scroll",
"scrollTo": "#section-2", // Scroll to element (takes precedence)
"x": 0, // OR horizontal scroll position in pixels
"y": 500, // OR vertical scroll position in pixels
"behavior": "smooth" // "auto" or "smooth" (default: auto)
}Press keyboard keys:
{
"type": "keypress",
"key": "Enter", // Key to press (e.g., "Enter", "Tab", "Escape", "ArrowDown")
"modifiers": ["Control", "Shift"], // Optional modifiers
"selector": "#search-box" // Optional element to focus first
}Use wait step instead:
{ "type": "wait", "for": ".loading-complete", "timeout": 10000 }Use wait step with duration instead:
{ "type": "wait", "duration": 2000 }Focus an element:
{
"type": "focus",
"selector": "#search-input"
}Blur (unfocus) an element:
{
"type": "blur",
"selector": "#search-input"
}Clear input field contents:
{
"type": "clear",
"selector": "#search-input"
}Execute custom JavaScript:
{
"type": "evaluate",
"script": "document.title = 'New Title'; return document.title;",
"selector": "#element" // Optional element to pass to the script
}Capture screenshot at any point:
{
"type": "screenshot",
"fullPage": true, // Capture entire page (optional)
"captureElement": ".specific-element" // Capture specific element (optional)
}Set or delete browser cookies:
{
"type": "cookie",
"action": "set",
"name": "session_id",
"value": "abc123",
"domain": ".example.com",
"path": "/",
"secure": true
}Supported actions: set (add/update cookie), delete (remove cookie).
Manage localStorage/sessionStorage:
{
"type": "storage",
"storageType": "localStorage",
"action": "set",
"key": "user_preferences",
"value": "{\"theme\":\"dark\"}"
}Supported actions: set (add/update), delete (remove key), clear (remove all items).
{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com/signup",
"steps": [
{ "type": "waitForSelector", "awaitElement": "#signup-form" },
{ "type": "text", "selector": "#email", "value": "user@example.com" },
{ "type": "text", "selector": "#password", "value": "SecurePass123!" },
{ "type": "select", "selector": "#country", "text": "United States" },
{ "type": "radio", "selector": "input[name='plan']", "value": "premium" },
{ "type": "checkbox", "selector": "#newsletter", "checked": true },
{ "type": "checkbox", "selector": "#terms", "checked": true },
{ "type": "hover", "selector": ".tooltip-trigger", "duration": 500 },
{ "type": "scroll", "y": 200 },
{ "type": "click", "target": "button[type='submit']", "waitForNavigation": true },
{ "type": "delay", "duration": 2000 },
{ "type": "screenshot", "fullPage": true }
]
}
}If your capture fails, use these guidelines:
| Error | Solution |
|---|---|
| "element not found" | Check CSS selector, add waitForSelector before click/fillForm |
| "navigation timeout" | Increase retryPolicy.maxRetries or add delay step |
| "page not loaded" | Add waitForSelector or delay before screenshot |
| "click failed" | Ensure element is visible, add scroll to bring it into view |
The following viewport presets are available:
desktop-fhd: 1920x1080 Full HDdesktop-hd: 1280x720 HDdesktop-4k: 3840x2160 4Kmacbook-pro-16: MacBook Pro 16-inch Retina
ipad-pro: iPad Pro 12.9-inchipad-pro-landscape: iPad Pro 12.9-inch (landscape)ipad: iPad 10.2-inchsurface-pro: Microsoft Surface Pro
iphone-14-pro-max: iPhone 14 Pro Maxiphone-14-pro: iPhone 14 Proiphone-se: iPhone SE (3rd generation)pixel-7-pro: Google Pixel 7 Progalaxy-s23-ultra: Samsung Galaxy S23 Ultra
{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com",
"viewport": {
"preset": "iphone-14-pro"
}
}
}The tools automatically retry on transient failures with exponential backoff. Default retryable conditions:
- HTTP status codes: 500, 502, 503, 504, 408, 429
- Network errors: ETIMEDOUT, ECONNRESET, ENOTFOUND, ECONNREFUSED
- DNS resolution failures
{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com",
"retryPolicy": {
"maxRetries": 5,
"initialDelayMs": 2000,
"backoffMultiplier": 1.5
}
}
}The server emits structured telemetry events that can be consumed for monitoring and observability:
tool.invoked: Tool execution startedtool.completed: Tool execution succeededtool.failed: Tool execution failednavigation.started: Page navigation initiatednavigation.completed: Page navigation succeedednavigation.failed: Page navigation failedretry.attempt: Retry attempt startedretry.succeeded: Retry succeededbrowser.launched: Puppeteer browser startedbrowser.closed: Puppeteer browser closedscreenshot.captured: Screenshot takendom.extracted: DOM content extracted
You can configure telemetry hooks programmatically:
import { getGlobalTelemetry } from "mcp-page-capture";
const telemetry = getGlobalTelemetry();
// Configure HTTP sink for centralized collection
telemetry.configureHttpSink({
url: "https://telemetry.example.com/events",
headers: { "X-API-Key": "your-api-key" },
batchSize: 100,
flushIntervalMs: 5000,
});
// Register custom hooks
telemetry.registerHook({
name: "custom-logger",
enabled: true,
handler: async (event) => {
console.log(`[${event.type}]`, event.data);
},
});Captures can be automatically saved to configurable storage backends:
import { registerStorageTarget, LocalStorageTarget } from "mcp-page-capture";
const localStorage = new LocalStorageTarget("/path/to/captures");
registerStorageTarget("local", localStorage);import { registerStorageTarget, S3StorageTarget } from "mcp-page-capture";
const s3Storage = new S3StorageTarget({
bucket: "my-captures",
prefix: "screenshots/",
region: "us-west-2",
});
registerStorageTarget("s3", s3Storage);import { registerStorageTarget, MemoryStorageTarget } from "mcp-page-capture";
const memoryStorage = new MemoryStorageTarget();
registerStorageTarget("memory", memoryStorage);Then use the storage in tool invocations:
{
"tool": "captureScreenshot",
"params": {
"url": "https://example.com",
"storageTarget": "s3"
}
}- Dynamic pages requiring complex authentication flows or user gestures are not yet automated
- Extremely long or infinite-scroll pages may exceed default Chromium memory limits
- S3 storage backend requires AWS SDK integration (placeholder implementation included)
- Release automation is powered by semantic-release and GitHub Actions
- Commits must follow the Conventional Commits spec (
feat:,fix:,chore:) for automatic versioning - Publishes to npm registry on successful builds from
mainbranch
- Multi-platform images (linux/amd64, linux/arm64) are automatically built and published
- Images are pushed to:
- Docker Hub:
<username>/mcp-page-capture - GitHub Container Registry:
ghcr.io/<org>/mcp-page-capture
- Docker Hub:
- Tagged with semantic version, major, major.minor, and latest
Configure these secrets in your GitHub repository settings:
NPM_TOKEN: npm access token with publish permissionsDOCKER_USERNAME: Docker Hub usernameDOCKER_PASSWORD: Docker Hub access token
The GITHUB_TOKEN is provided automatically by GitHub Actions.
Read CONTRIBUTING.md, open an issue describing the change, and submit a PR that includes npm run build output plus updated docs/tests.
Maintained by Saurabh Chase (@chasesaurabh). Reach out via issues or discussions for roadmap coordination.
Released under the MIT License.
