Skip to content

AI-Powered LinkedIn Research Platform with Multi-Agent System (Claude SDK) and Browser Automation (MCP)

Notifications You must be signed in to change notification settings

maree217/linkedin-researcher

Repository files navigation

LinkedIn Research Agent

AI agent using Claude Cookbooks patterns + Browser MCP for LinkedIn research

An intelligent agent that researches LinkedIn profiles using Claude Sonnet 4.5, implementing the orchestrator-workers pattern from Claude Cookbooks, with Browser MCP's 33 browser automation tools.

Architecture (v2 - Orchestrator-Workers Pattern)

┌─────────────────────────────────────────────────────────────────┐
│                    LinkedIn Research Agent                       │
│                  (Orchestrator-Workers Pattern)                  │
├─────────────────────────────────────────────────────────────────┤
│  ┌──────────────────┐                                           │
│  │   Orchestrator   │  Analyzes task, breaks into subtasks      │
│  │   (Claude 4.5)   │  Coordinates workers                      │
│  └────────┬─────────┘                                           │
│           │                                                      │
│           ├─────────┬──────────┬──────────┬──────────┐         │
│           ▼         ▼          ▼          ▼          ▼         │
│     ┌─────────┐ ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐       │
│     │Navigator│ │Searcher│ │Extractor│ │Analyzer│ │Reporter│  │
│     │ Worker  │ │ Worker │ │ Worker  │ │ Worker │ │ Worker │  │
│     └────┬────┘ └───┬────┘ └───┬─────┘ └───┬────┘ └───┬────┘  │
└──────────┼──────────┼──────────┼───────────┼──────────┼────────┘
           │          │          │           │          │
           └──────────┴──────────┴───────────┴──────────┘
                                 │
                    ┌────────────┴────────────┐
                    │                         │
           ┌────────▼────────┐      ┌────────▼────────┐
           │  Browser MCP     │      │  Memory System  │
           │  (33 Tools)      │      │  (File Storage) │
           │  - navigate      │      │  - profiles/    │
           │  - snapshot      │      │  - sessions/    │
           │  - click         │      │  - cache/       │
           │  - evaluate      │      └─────────────────┘
           │  - wait_for      │
           └──────────────────┘

Features

🎯 Orchestrator-Workers Pattern (from Claude Cookbooks)

  • Dynamic Task Decomposition: Orchestrator analyzes each task and creates optimal subtask plan
  • Specialized Workers: Navigator, Searcher, Extractor, Analyzer, Reporter
  • Adaptive Planning: Worker selection based on specific task requirements
  • Coordinated Execution: Structured communication via XML

🧠 Memory System (from Claude Cookbooks)

  • File-based Storage: Profiles, sessions, and cache in /memories
  • Persistent State: Research data survives between sessions
  • Structured Storage: JSON format for profiles
  • Security: Path validation prevents directory traversal

🔧 Browser MCP Integration

  • 33 Tools Available: All Browser MCP tools accessible to workers
  • Session Persistence: Login once, stay logged in
  • CDP Integration: Stable element targeting
  • Accessibility Tree: Reliable element identification

📊 Training Data Collection

  • Agent Lightning Ready: Collects orchestrator + worker data for APO training
  • Enhanced Metrics: Worker performance, task decomposition quality
  • Structured Logging: JSONL format for ML training
  • Performance Tracking: Duration, success rate, profiles found

🎯 Research Capabilities

  • Profile search with filters (title, location, company)
  • Company employee research
  • Competitive analysis across companies
  • Structured data extraction
  • Professional Excel/PDF reports (via Skills API)

Installation

1. Prerequisites

  • Python 3.8+
  • Node.js 18+ (for Browser MCP server)
  • Browser MCP Server: Must be built at /Users/rammaree/projects/social-browser-mcp

2. Install Dependencies

cd linkedin-researcher
pip install -r requirements.txt

3. Configure Environment

cp .env.example .env

Edit .env and set:

  • ANTHROPIC_API_KEY: Your Anthropic API key
  • BROWSER_MCP_PATH: Path to Browser MCP server (default: ../social-browser-mcp/dist/index.js)

4. Verify Browser MCP Server

# Make sure Browser MCP is built
cd /Users/rammaree/projects/social-browser-mcp
npm run build

# Verify it works
node dist/index.js

Usage

Basic Usage

# Research 10 Product Managers in Mumbai
python main.py \
  --query "Product Manager" \
  --count 10 \
  --location "Mumbai"

# Research 5 Software Engineers at Google
python main.py \
  --query "Software Engineer" \
  --count 5 \
  --company "Google"

# Save results to custom file
python main.py \
  --query "Data Scientist" \
  --count 20 \
  --location "Bangalore" \
  --output data_scientists.json

Python API

import asyncio
from src.agent import LinkedInResearchAgent
from src.mcp_client import BrowserMCPClient

async def main():
    # Connect to Browser MCP
    mcp_client = BrowserMCPClient("/path/to/social-browser-mcp/dist/index.js")
    await mcp_client.connect()

    # Create agent
    agent = LinkedInResearchAgent(
        api_key="your_anthropic_api_key",
        mcp_client=mcp_client
    )

    # Run research
    result = await agent.research_profiles(
        query="Product Manager",
        count=10,
        location="Mumbai"
    )

    print(result)

    # Cleanup
    await mcp_client.disconnect()

asyncio.run(main())

How It Works

1. Autonomous Agent Loop

The agent uses Claude Sonnet 4.5 with tool use capabilities:

1. Receive research task
2. Agent analyzes task and decides which tools to use
3. Agent calls Browser MCP tools (navigate, click, extract, etc.)
4. Agent processes results
5. Agent decides next action (continue or finish)
6. Repeat until task complete

2. Tool Usage Example

Agent workflow for "Research 5 Product Managers in Mumbai":

Step 1: browser_navigate → Navigate to www.linkedin.com
Step 2: browser_snapshot → Get page structure
Step 3: browser_click → Click search box
Step 4: browser_type → Type "Product Manager Mumbai"
Step 5: browser_press → Press Enter
Step 6: browser_wait_for → Wait for results
Step 7: browser_snapshot → Get search results
Step 8: browser_click → Click first profile
Step 9: browser_snapshot → Extract profile data
Step 10: browser_navigate → Navigate back to search
... repeat for 5 profiles
Step N: Return structured JSON with all profile data

3. Training Data Collection

When ENABLE_TRAINING_MODE=true, the agent logs:

{
  "task_id": "task_1234567890",
  "timestamp": "2025-11-04T10:30:00Z",
  "query": "Product Manager in Mumbai",
  "task_type": "profile_search",
  "parameters": {"count": 10, "location": "Mumbai"},
  "status": "completed",
  "result": {...},
  "duration_seconds": 45,
  "tools_used": ["browser_navigate", "browser_click", ...]
}

This data is used for Agent Lightning APO training.


Agent Lightning Training (Phase 5.8)

Overview

After collecting 50+ training examples, optimize the agent with Agent Lightning APO:

from agent_lightning import APO

# Load training data
training_data = load_training_examples()

# Initialize APO
apo = APO(
    initial_prompt=agent.system_prompt,
    evaluation_dataset=training_data,
    optimization_metric="success_rate"
)

# Optimize (costs ~$5-10)
optimized_prompt = apo.optimize()

# Deploy
agent.system_prompt = optimized_prompt

Expected Improvements

Before Training:

  • Success rate: ~80%
  • Avg profiles per task: 8/10
  • Avg duration: 60 seconds

After Agent Lightning:

  • Success rate: ~90-95%
  • Avg profiles per task: 9.5/10
  • Avg duration: 45 seconds

Improvement: 10-20% across all metrics


Project Structure

linkedin-researcher/
├── src/
│   ├── __init__.py           # Package initialization
│   ├── agent.py              # Claude SDK agent (autonomous loop)
│   ├── mcp_client.py         # Browser MCP connection
│   ├── memory.py             # Memory system (future)
│   └── workflows/
│       └── profile_search.py # Profile search workflow (future)
│
├── config/
│   ├── agent_config.yaml     # Agent configuration
│   └── mcp_config.json       # MCP connection config
│
├── tests/
│   └── test_agent.py         # Unit tests (future)
│
├── training/
│   └── data.jsonl            # Training data for Agent Lightning
│
├── logs/
│   └── agent.log             # Agent logs
│
├── main.py                   # CLI entry point
├── requirements.txt          # Python dependencies
├── .env.example              # Environment template
└── README.md                 # This file

Configuration

Agent Configuration (config/agent_config.yaml)

agent:
  model: "claude-sonnet-4-5"
  max_tokens: 4000
  temperature: 0.7
  system_prompt: |
    You are a LinkedIn research specialist...

workflows:
  profile_search:
    max_profiles: 50
    timeout_seconds: 300

MCP Configuration (config/mcp_config.json)

{
  "mcp_servers": {
    "browser-mcp": {
      "command": "node",
      "args": ["/path/to/social-browser-mcp/dist/index.js"],
      "transport": "stdio"
    }
  }
}

Development Roadmap

Phase 5.1: Project Structure ✅ (Complete)

  • Created directory structure
  • Configuration files
  • Environment setup

Phase 5.2: MCP Client ✅ (Complete)

  • stdio transport to Browser MCP
  • Tool listing and invocation
  • Error handling

Phase 5.3: Agent Implementation ✅ (Complete)

  • Autonomous agent loop
  • Tool use integration
  • Training data collection

Phase 5.4: Workflows (Next)

  • Profile search workflow
  • Company research workflow
  • Competitive analysis workflow

Phase 5.5: Memory System (Planned)

  • Redis for session state
  • PostgreSQL for research results
  • pgvector for semantic search

Phase 5.6: Testing (Planned)

  • Unit tests
  • Integration tests
  • E2E tests

Phase 5.7: Training Data Collection (Planned)

  • Run 50+ research tasks
  • Collect performance metrics
  • Analyze failure cases

Phase 5.8: Agent Lightning Training (Planned)

  • Load training data
  • Run APO optimization
  • A/B test improvements
  • Deploy optimized agent

Phase 5.9: Production Deployment (Planned)

  • Containerization
  • API server
  • Monitoring
  • Scaling

Cost Analysis

Research Costs (per task)

Claude API:

  • Input tokens: ~5,000 tokens (research query + tool results)
  • Output tokens: ~2,000 tokens (agent thinking + structured response)
  • Cost per task: ~$0.15 (with Claude Sonnet 4.5)

For 50 profiles (5 tasks × 10 profiles):

  • Total cost: ~$0.75

Comparison:

  • Manual research: 2-3 hours @ $50/hour = $100-150
  • Agent cost: $0.75
  • Savings: 99%+

Training Costs

Agent Lightning APO:

  • Training: ~$5-10 for 50 tasks
  • One-time cost

Expected ROI:

  • 10-20% efficiency improvement
  • Pays for itself after ~50-100 tasks

Troubleshooting

Browser MCP Connection Issues

# Check if Browser MCP is built
cd /Users/rammaree/projects/social-browser-mcp
npm run build

# Test Browser MCP directly
node dist/index.js

API Key Issues

# Verify API key is set
echo $ANTHROPIC_API_KEY

# Or check .env file
cat .env | grep ANTHROPIC_API_KEY

Python Dependencies

# Reinstall dependencies
pip install -r requirements.txt --force-reinstall

Examples

Example 1: Research Product Managers in Mumbai

python main.py \
  --query "Product Manager" \
  --count 10 \
  --location "Mumbai" \
  --output pm_mumbai.json

Output (pm_mumbai.json):

{
  "result": "Found 10 Product Manager profiles in Mumbai",
  "profiles": [
    {
      "name": "John Doe",
      "title": "Senior Product Manager",
      "company": "Google",
      "location": "Mumbai, India",
      "experience_years": 8,
      "education": "MBA, IIM Bangalore",
      "profile_url": "https://www.linkedin.com/in/johndoe"
    },
    ...
  ],
  "iterations": 15,
  "duration_seconds": 45
}

Example 2: Research Software Engineers at Google

python main.py \
  --query "Software Engineer" \
  --count 5 \
  --company "Google" \
  --output google_engineers.json

Key Differences from Browser MCP

Browser MCP (Deterministic Tools)

  • 33 browser automation tools
  • Deterministic behavior (given input → fixed output)
  • Session persistence
  • Used by agents, not trainable itself

LinkedIn Research Agent (Trainable Agent)

  • Uses Browser MCP tools
  • Autonomous decision-making (which tools to use, when, how)
  • Trainable with Agent Lightning/DSPy
  • Optimizable prompts and workflows

Key Insight: RL/DSPy applies to the AGENT (this project), not the tools (Browser MCP)


License

MIT


Acknowledgments

Built using:

  • Claude Sonnet 4.5: Anthropic's frontier model
  • Browser MCP: 33-tool browser automation server
  • Agent Lightning: Microsoft's APO training framework
  • MCP Protocol: Model Context Protocol for tool integration

Inspired by: Production-ready autonomous agent patterns from Claude Cookbooks and Azure AI Foundry research.

About

AI-Powered LinkedIn Research Platform with Multi-Agent System (Claude SDK) and Browser Automation (MCP)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published