Skip to content

HyDRA is a hierarchical deep research agent framework that orchestrates planning, tool usage, web exploration, and synthesis through robust context management, enabling reliable multi-step reasoning under strict output and token constraints.

License

Notifications You must be signed in to change notification settings

Se00n00/Heirarchial-Deep-researcher-Agent

Repository files navigation

Python Research Agent Architecture Status License Reproducible


click Video to view Demo

HyDRA: Hierarchical Deep Researcher Agent

Content Navigation


Project Overview

HyDRA (Hierarchical Deep Researcher Agent) is an agentic research framework designed to execute complex, multi-step information-seeking and reasoning tasks with high reliability. It combines hierarchical agent delegation, strict action–observation execution loops, and aggressive context management to prevent failure modes commonly observed in long-horizon LLM workflows.

At its core, HyDRA separates planning, execution, research, and synthesis into well-defined agents and tools. Each component operates under strict output contracts (JSON-only actions), ensuring deterministic downstream processing and eliminating ambiguity during tool invocation.

A central Context Manager continuously validates, summarizes, and minimizes tool and agent outputs. This makes HyDRA resilient to:

  • Context window exhaustion
  • Tool output verbosity
  • JSON decoding failures due to context pollution

HyDRA is designed for:

  • Deep web and archive research
  • Benchmark-driven evaluation (GAIA, SimpleQA, HLE)
  • Structured, verifiable final answers
  • Extensibility via modular toolchains

The system is suitable for research automation, evaluation of reasoning agents, and experimentation with hierarchical agent architectures.


Heirarchial Multi-agent Architecture

Delegation and Response Flow

Action: [Planning_agent > Managed_agent]

    {
      name: <agent_name>,
      task: <delegated_task>
    }

Final_answer: [Planning_agent < Managed_agent]

    {
      answer: <observation_from_managed_agent>
    }

HyDRA follows a hierarchical, contract-driven agent architecture with strict separation of concerns.

1. High-Level Flow

  1. Planning Agent

    • Interprets the task
    • Delegates actions to managed agent while overviewing every observations to solve user's task
  2. Execution Agents

    • browser_use_agent: Handles live web interaction
    • deep_researcher_agent: Performs multi-source synthesis and deeper reasoning with web-use
    • deep_analyzer_agent: Analyze the source (files / images) with given task. [In Progress]
    • code_agent: Creates tools as function dynamically and puts then into agent context which they can use using given python_interpreter tool [In Progress]
    • Agents operate in an Action → Observation loop
  3. Tool Layer

    • Web search, visti page navigation, archive search, Python execution
    • Each tool produces validated outputs
  4. Context Manager

    • Intercepts every tool and agent output=
    • Summarizes verbose outputs
    • Minimizes context when token limits are reached
  5. Final Answer Tool

    • Produces a single, structured final response
    • Terminates execution deterministically

2. Context Management Strategy (Key Differentiator)

The Context Manager provides three critical functions:

  • Validate

    • Ensures tool/agent outputs conform to expected formats
    • Prevents invalid data from propagating
  • Summarize

    • Compresses long tool outputs into task-relevant facts
    • Preserves decision-critical information only
  • Minimize

    • Shrinks historical context using validation rules
    • Automatically triggered when context budget is exceeded

3. Failure Handling and Robustness

HyDRA implements explicit and layered failure-handling mechanisms to mitigate common failure modes in long-horizon agentic workflows.

1. Context Pollution Prevention Tools may occasionally return verbose, poorly structured, or markdown-heavy outputs that can pollute the agent context and degrade downstream reasoning.

  • Mitigation:
    • Every tool output is validated before being appended to the observation history. Invalid or low-signal outputs are filtered, summarized, or rejected, preventing unnecessary noise from entering the context.

2. Non-Valid Structured Output from Models Some instruction-following models—particularly freely available Groq-hosted models—may emit malformed or non-structured outputs, including mismatched tool arguments or broken JSON, especially under polluted or compressed context conditions.

  • Mitigation:

    • System summarize the most recent context segment; if the issue persists, remove the last context entry entirely.
    • Enforce a maximum retry allowed per model invocation.
    • If repeated attempts fail to recover valid output, the affected agent instance is terminated while the overall system continues execution. The Planning Agent can re-delegate the task to an alternative agent or execution path.

3. Context Limit Management When the context window approaches its limit, HyDRA proactively reduces context size.

  • Mitigation:
    • The Context Manager shrinks historical context to a minimal, information-preserving representation that retains key facts, decisions, and constraints required for task continuation.

These mechanisms substantially improve robustness in long-running research tasks. While failures may still occur due to external constraints such as request-rate exhaustion or rare Planning Agent errors, such failures are isolated and do not compromise the stability of the overall system.


Project Structure

.
├── src/                         # Core agent implementation
│   ├── agent.py                 # Entry point for the hierarchical research agent
│   ├── core/                    # Agent core logic
│   │   ├── agent.py             # Base Agent logic
│   │   ├── context_manager.py   # Context and token budget management: validate tool output, minimize context, summerize particular agent / tool output
│   │   ├── state.py             # Agent state representation
│   │   ├── utils.py             # Shared utilities
│   │   └── prompts/             # Prompt templates (YAML + Markdown)
│   ├── tools/                   # Tooling layer used by the agent
│   │   ├── registry.py          # Tool registration logic
│   │   ├── registry.yaml        # Tool configuration
│   │   ├── tools_registry.py    # Tool resolution and dispatch
│   │   ├── deep_researcher/     # Deep research toolchain
│   │   ├── archive_searcher/    # Academic / archive search tools
│   │   ├── web_browser/         # Web interaction and parsing tools
│   │   ├── python_interpreter/  # Local Python execution tool
│   │   └── final_answer.py      # Final answer tool
│   └── tool_builder.py          # Tools registry construction
│
├── Evaluation_Suite/             # Evaluation and benchmarking framework
│   ├── evaluation_gaia.py        # GAIA benchmark
│   ├── evaluation_simpleQA.py    # SimpleQA benchmark
│   ├── evaluation_hle.py         # HLE benchmark
│   ├── evaluation_utils.py       # Shared evaluation utilities
│   ├── evaluation_config.yaml   # Evaluation configuration
│   └── Evaluation_results/      # Stored evaluation outputs
│
├── notebooks/                    # Analysis and benchmarking notebooks
├── docs/                         # Project documentation
├── app.py                        # Frontend demo backend
├── Dockerfile                    # Containerized execution environment
├── requirements.txt              # Python dependencies
├── README.md                     # Project documentation
└── LICENSE                       # License

Requirements

API Keys

The following environment variables are required to run or evaluate the agent:

Variable Description
GROQ_API_KEY Primary agent inference
GROQ_API_KEY_2 Context manager (token-efficient planning)
GROQ_BASE_URL Groq API base URL
HF_TOKEN Hugging Face token (required for evaluation only)

Example:

    export GROQ_API_KEY=your_key_here
    export GROQ_API_KEY_2=your_key_here
    export GROQ_BASE_URL=https://api.groq.com
    export HF_TOKEN=your_hf_token_here

Setup and Reproducibility

This mutli-agent is fully reproducible and can be run in an isolated virtual environment with no cached dependencies.

Quickstart [Run the Agent]

    # Create and activate a virtual environment
    python -m venv .venv
    source .venv/bin/activate

    # Set required environment variables
    export GROQ_API_KEY="your_key_here"
    export GROQ_API_KEY_2="your_key_here"
    export GROQ_BASE_URL="https://api.groq.com"


    # Install dependencies without caching (low disk usage)
    python -m pip install --no-cache-dir -r requirements.txt

    # Run the agent
    python src/agent.py

Start With UI [Run the Agent]

    # Create and activate a virtual environment
    python -m venv .venv
    source .venv/bin/activate

    # Set required environment variables
    export GROQ_API_KEY="your_key_here"
    export GROQ_API_KEY_2="your_key_here"
    export GROQ_BASE_URL="https://api.groq.com"

    # OPTION 1: Without Docker
    # Install dependencies without caching (low disk usage)
    python -m pip install --no-cache-dir -r requirements.txt

    # Run the server
    uvicorn app:app --host 0.0.0.0 --port 10000  
    
    # OPTION 2: Run Using Docker
    docker build --no-cache --progress=plain -t app:latest .
    docker run -p 10000:10000 app:latest

    # Run Frontend
    git clone https://github.com/Se00n00/Chat/tree/main
    git checkout Hydra
    npm install
    export NG_APP_BACKEND="http://127.0.0.1:10000/chat"  ! Important
    ng serve

Evaluation

Evaluation scripts benchmark the agent on established datasets.

Run Evaluation

    # Create and activate a virtual environment
    python -m venv .venv
    source .venv/bin/activate
    
    # Set environment variables
    export GROQ_API_KEY="your_key_here"
    export GROQ_API_KEY_2="your_key_here"
    export GROQ_BASE_URL="https://api.groq.com"
    export HF_TOKEN="you_key_here"
    
    # Install dependencies without caching
    python -m pip install --no-cache-dir -r requirements.txt
    
    # Run evaluation
    python Evaluation_Suite/evaluation_<dataset>.py <evaluation_name> --time [sleep_seconds]

Parameters

  • <dataset>: gaia, simpleQA, hle
  • <evaluation_name>: Unique identifier for the run
  • --time [optional]: Sleep interval between examples (rate-limit control)

Status

This project is fully functional. Core agent capabilities are stable and usable.
Evaluation benchmarks are currently in progress.

Implementation Status

  • Hierarchical multi-agent architecture
  • Tool registry and dynamic tool invocation
  • Web browsing and document analysis tools
  • Local Python execution tool
  • Deterministic agent execution (seeded runs)
  • Reproducible environment setup (no cache, isolated venv)
  • code agent

Evaluation Status

  • GAIA benchmark evaluation
  • SimpleQA benchmark evaluation
  • HLE benchmark evaluation

Documentation

  • Project structure documentation
  • Dataset documentation
  • Example run notebooks
  • Architecture diagram (final)
  • Comprehensive evaluation report

About

HyDRA is a hierarchical deep research agent framework that orchestrates planning, tool usage, web exploration, and synthesis through robust context management, enabling reliable multi-step reasoning under strict output and token constraints.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 3

  •  
  •  
  •