LinkedIn Research Agent

AI agent using Claude Cookbooks patterns + Browser MCP for LinkedIn research

An intelligent agent that researches LinkedIn profiles using Claude Sonnet 4.5, implementing the orchestrator-workers pattern from Claude Cookbooks, with Browser MCP's 33 browser automation tools.

Architecture (v2 - Orchestrator-Workers Pattern)

┌─────────────────────────────────────────────────────────────────┐
│                    LinkedIn Research Agent                       │
│                  (Orchestrator-Workers Pattern)                  │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐                                           │
│  │   Orchestrator   │  Analyzes task, breaks into subtasks      │
│  │   (Claude 4.5)   │  Coordinates workers                      │
│  └────────┬─────────┘                                           │
│           │                                                      │
│           ├─────────┬──────────┬──────────┬──────────┐         │
│           ▼         ▼          ▼          ▼          ▼         │
│     ┌─────────┐ ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐       │
│     │Navigator│ │Searcher│ │Extractor│ │Analyzer│ │Reporter│  │
│     │ Worker  │ │ Worker │ │ Worker  │ │ Worker │ │ Worker │  │
│     └────┬────┘ └───┬────┘ └───┬─────┘ └───┬────┘ └───┬────┘  │
└──────────┼──────────┼──────────┼───────────┼──────────┼────────┘
           │          │          │           │          │
           └──────────┴──────────┴───────────┴──────────┘
                                 │
                    ┌────────────┴────────────┐
                    │                         │
           ┌────────▼────────┐      ┌────────▼────────┐
           │  Browser MCP     │      │  Memory System  │
           │  (33 Tools)      │      │  (File Storage) │
           │  - navigate      │      │  - profiles/    │
           │  - snapshot      │      │  - sessions/    │
           │  - click         │      │  - cache/       │
           │  - evaluate      │      └─────────────────┘
           │  - wait_for      │
           └──────────────────┘

Features

🎯 Orchestrator-Workers Pattern (from Claude Cookbooks)

Dynamic Task Decomposition: Orchestrator analyzes each task and creates optimal subtask plan
Specialized Workers: Navigator, Searcher, Extractor, Analyzer, Reporter
Adaptive Planning: Worker selection based on specific task requirements
Coordinated Execution: Structured communication via XML

🧠 Memory System (from Claude Cookbooks)

File-based Storage: Profiles, sessions, and cache in /memories
Persistent State: Research data survives between sessions
Structured Storage: JSON format for profiles
Security: Path validation prevents directory traversal

🔧 Browser MCP Integration

33 Tools Available: All Browser MCP tools accessible to workers
Session Persistence: Login once, stay logged in
CDP Integration: Stable element targeting
Accessibility Tree: Reliable element identification

📊 Training Data Collection

Agent Lightning Ready: Collects orchestrator + worker data for APO training
Enhanced Metrics: Worker performance, task decomposition quality
Structured Logging: JSONL format for ML training
Performance Tracking: Duration, success rate, profiles found

🎯 Research Capabilities

Profile search with filters (title, location, company)
Company employee research
Competitive analysis across companies
Structured data extraction
Professional Excel/PDF reports (via Skills API)

Installation

1. Prerequisites

Python 3.8+
Node.js 18+ (for Browser MCP server)
Browser MCP Server: Must be built at /Users/rammaree/projects/social-browser-mcp

2. Install Dependencies

cd linkedin-researcher
pip install -r requirements.txt

3. Configure Environment

cp .env.example .env

Edit .env and set:

ANTHROPIC_API_KEY: Your Anthropic API key
BROWSER_MCP_PATH: Path to Browser MCP server (default: ../social-browser-mcp/dist/index.js)

4. Verify Browser MCP Server

# Make sure Browser MCP is built
cd /Users/rammaree/projects/social-browser-mcp
npm run build

# Verify it works
node dist/index.js

Usage

Basic Usage

# Research 10 Product Managers in Mumbai
python main.py \
  --query "Product Manager" \
  --count 10 \
  --location "Mumbai"

# Research 5 Software Engineers at Google
python main.py \
  --query "Software Engineer" \
  --count 5 \
  --company "Google"

# Save results to custom file
python main.py \
  --query "Data Scientist" \
  --count 20 \
  --location "Bangalore" \
  --output data_scientists.json

Python API

import asyncio
from src.agent import LinkedInResearchAgent
from src.mcp_client import BrowserMCPClient

async def main():
    # Connect to Browser MCP
    mcp_client = BrowserMCPClient("/path/to/social-browser-mcp/dist/index.js")
    await mcp_client.connect()

    # Create agent
    agent = LinkedInResearchAgent(
        api_key="your_anthropic_api_key",
        mcp_client=mcp_client
    )

    # Run research
    result = await agent.research_profiles(
        query="Product Manager",
        count=10,
        location="Mumbai"
    )

    print(result)

    # Cleanup
    await mcp_client.disconnect()

asyncio.run(main())

How It Works

1. Autonomous Agent Loop

The agent uses Claude Sonnet 4.5 with tool use capabilities:

1. Receive research task
2. Agent analyzes task and decides which tools to use
3. Agent calls Browser MCP tools (navigate, click, extract, etc.)
4. Agent processes results
5. Agent decides next action (continue or finish)
6. Repeat until task complete

2. Tool Usage Example

Agent workflow for "Research 5 Product Managers in Mumbai":

Step 1: browser_navigate → Navigate to www.linkedin.com
Step 2: browser_snapshot → Get page structure
Step 3: browser_click → Click search box
Step 4: browser_type → Type "Product Manager Mumbai"
Step 5: browser_press → Press Enter
Step 6: browser_wait_for → Wait for results
Step 7: browser_snapshot → Get search results
Step 8: browser_click → Click first profile
Step 9: browser_snapshot → Extract profile data
Step 10: browser_navigate → Navigate back to search
... repeat for 5 profiles
Step N: Return structured JSON with all profile data

3. Training Data Collection

When ENABLE_TRAINING_MODE=true, the agent logs:

{
  "task_id": "task_1234567890",
  "timestamp": "2025-11-04T10:30:00Z",
  "query": "Product Manager in Mumbai",
  "task_type": "profile_search",
  "parameters": {"count": 10, "location": "Mumbai"},
  "status": "completed",
  "result": {...},
  "duration_seconds": 45,
  "tools_used": ["browser_navigate", "browser_click", ...]
}

This data is used for Agent Lightning APO training.

Agent Lightning Training (Phase 5.8)

Overview

After collecting 50+ training examples, optimize the agent with Agent Lightning APO:

from agent_lightning import APO

# Load training data
training_data = load_training_examples()

# Initialize APO
apo = APO(
    initial_prompt=agent.system_prompt,
    evaluation_dataset=training_data,
    optimization_metric="success_rate"
)

# Optimize (costs ~$5-10)
optimized_prompt = apo.optimize()

# Deploy
agent.system_prompt = optimized_prompt

Expected Improvements

Before Training:

Success rate: ~80%
Avg profiles per task: 8/10
Avg duration: 60 seconds

After Agent Lightning:

Success rate: ~90-95%
Avg profiles per task: 9.5/10
Avg duration: 45 seconds

Improvement: 10-20% across all metrics

Project Structure

linkedin-researcher/
├── src/
│   ├── __init__.py           # Package initialization
│   ├── agent.py              # Claude SDK agent (autonomous loop)
│   ├── mcp_client.py         # Browser MCP connection
│   ├── memory.py             # Memory system (future)
│   └── workflows/
│       └── profile_search.py # Profile search workflow (future)
│
├── config/
│   ├── agent_config.yaml     # Agent configuration
│   └── mcp_config.json       # MCP connection config
│
├── tests/
│   └── test_agent.py         # Unit tests (future)
│
├── training/
│   └── data.jsonl            # Training data for Agent Lightning
│
├── logs/
│   └── agent.log             # Agent logs
│
├── main.py                   # CLI entry point
├── requirements.txt          # Python dependencies
├── .env.example              # Environment template
└── README.md                 # This file

Configuration

Agent Configuration (`config/agent_config.yaml`)

agent:
  model: "claude-sonnet-4-5"
  max_tokens: 4000
  temperature: 0.7
  system_prompt: |
    You are a LinkedIn research specialist...

workflows:
  profile_search:
    max_profiles: 50
    timeout_seconds: 300

MCP Configuration (`config/mcp_config.json`)

{
  "mcp_servers": {
    "browser-mcp": {
      "command": "node",
      "args": ["/path/to/social-browser-mcp/dist/index.js"],
      "transport": "stdio"
    }
  }
}

Development Roadmap

Phase 5.1: Project Structure ✅ (Complete)

Created directory structure
Configuration files
Environment setup

Phase 5.2: MCP Client ✅ (Complete)

stdio transport to Browser MCP
Tool listing and invocation
Error handling

Phase 5.3: Agent Implementation ✅ (Complete)

Autonomous agent loop
Tool use integration
Training data collection

Phase 5.4: Workflows (Next)

Profile search workflow
Company research workflow
Competitive analysis workflow

Phase 5.5: Memory System (Planned)

Redis for session state
PostgreSQL for research results
pgvector for semantic search

Phase 5.6: Testing (Planned)

Unit tests
Integration tests
E2E tests

Phase 5.7: Training Data Collection (Planned)

Run 50+ research tasks
Collect performance metrics
Analyze failure cases

Phase 5.8: Agent Lightning Training (Planned)

Load training data
Run APO optimization
A/B test improvements
Deploy optimized agent

Phase 5.9: Production Deployment (Planned)

Containerization
API server
Monitoring
Scaling

Cost Analysis

Research Costs (per task)

Claude API:

Input tokens: ~5,000 tokens (research query + tool results)
Output tokens: ~2,000 tokens (agent thinking + structured response)
Cost per task: ~$0.15 (with Claude Sonnet 4.5)

For 50 profiles (5 tasks × 10 profiles):

Total cost: ~$0.75

Comparison:

Manual research: 2-3 hours @ $50/hour = $100-150
Agent cost: $0.75
Savings: 99%+

Training Costs

Agent Lightning APO:

Training: ~$5-10 for 50 tasks
One-time cost

Expected ROI:

10-20% efficiency improvement
Pays for itself after ~50-100 tasks

Troubleshooting

Browser MCP Connection Issues

# Check if Browser MCP is built
cd /Users/rammaree/projects/social-browser-mcp
npm run build

# Test Browser MCP directly
node dist/index.js

API Key Issues

# Verify API key is set
echo $ANTHROPIC_API_KEY

# Or check .env file
cat .env | grep ANTHROPIC_API_KEY

Python Dependencies

# Reinstall dependencies
pip install -r requirements.txt --force-reinstall

Examples

Example 1: Research Product Managers in Mumbai

python main.py \
  --query "Product Manager" \
  --count 10 \
  --location "Mumbai" \
  --output pm_mumbai.json

Output (pm_mumbai.json):

{
  "result": "Found 10 Product Manager profiles in Mumbai",
  "profiles": [
    {
      "name": "John Doe",
      "title": "Senior Product Manager",
      "company": "Google",
      "location": "Mumbai, India",
      "experience_years": 8,
      "education": "MBA, IIM Bangalore",
      "profile_url": "https://www.linkedin.com/in/johndoe"
    },
    ...
  ],
  "iterations": 15,
  "duration_seconds": 45
}

Example 2: Research Software Engineers at Google

python main.py \
  --query "Software Engineer" \
  --count 5 \
  --company "Google" \
  --output google_engineers.json

Key Differences from Browser MCP

Browser MCP (Deterministic Tools)

33 browser automation tools
Deterministic behavior (given input → fixed output)
Session persistence
Used by agents, not trainable itself