Skip to content

A sophisticated Java-based research agent that leverages multiple search providers and AI to conduct in-depth research on topics, answer questions, and synthesize information from diverse sources.

License

Notifications You must be signed in to change notification settings

j0sh162/JavaDeepResearchAgentProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Research Agent

A sophisticated Java-based research agent that leverages multiple search providers and AI to conduct in-depth research on topics, answer questions, and synthesize information from diverse sources.

Overview

The Deep Research Agent is designed to automate the process of researching topics by:

  1. Intelligently generating search queries
  2. Gathering information from multiple search providers
  3. Analyzing and synthesizing the collected information
  4. Providing comprehensive, structured research reports

The agent can determine when research is needed, conduct targeted searches, and combine results from multiple search engines for thorough analysis.

Features

  • Multiple Search Providers: Supports both DuckDuckGo scraping and Google Custom Search API
  • Self-Assessment: Determines if a question requires external research or can be answered with existing knowledge
  • Query Generation: Creates optimized search queries tailored to the research topic
  • Iterative Research: Conducts multiple rounds of research with follow-up questions
  • Comprehensive Synthesis: Combines information from all sources into coherent analyses
  • Memory: Maintains research history for context and follow-up
  • Flexible Configuration: Easily switch between or combine search providers

Architecture

The project follows a modular, interface-based design:

  • SearchProvider: Interface for different search engines
    • DuckDuckGoSearchProvider: Web scraping implementation
    • GoogleSearchProvider: API-based implementation
  • ResearchAgent: Core class that orchestrates the research process
  • SearchResult: Model class for search results
  • ResearchResponse: Comprehensive response with metadata

Setup

Prerequisites

  • Java 11 or higher
  • Maven or Gradle for dependency management
  • OpenAI API key
  • Google API key and Custom Search Engine ID (for Google search)

Dependencies

<dependencies>
    <!-- OpenAI API Client -->
    <dependency>
        <groupId>com.openai</groupId>
        <artifactId>openai-java-client</artifactId>
        <version>0.8.1</version>
    </dependency>
    
    <!-- HTTP Client -->
    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.15.3</version>
    </dependency>
    
    <!-- JSON Processing -->
    <dependency>
        <groupId>org.json</groupId>
        <artifactId>json</artifactId>
        <version>20231013</version>
    </dependency>
</dependencies>

API Keys

  1. OpenAI API Key:

  2. Google Custom Search:

Usage

Basic Usage

// Initialize providers
SearchProvider duckDuckGo = new DuckDuckGoSearchProvider();
SearchProvider google = new GoogleSearchProvider(googleApiKey, searchEngineId);

// Create a research agent with multiple providers
List<SearchProvider> providers = List.of(google, duckDuckGo);
ResearchAgent agent = new ResearchAgent(openAiApiKey, providers);

// Simple research
String result = agent.research("Quantum computing applications in medicine");
System.out.println(result);

// Deep dive with multiple iterations
String deepResult = agent.deepDive("Climate change mitigation technologies", 3);
System.out.println(deepResult);

// Question answering with self-assessment
ResearchResponse response = agent.answerQuestion("What were the latest developments in quantum computing as of 2025?");
System.out.println(response);

Configuration Options

// Use only DuckDuckGo
ResearchAgent ddgAgent = new ResearchAgent(openAiApiKey, List.of(new DuckDuckGoSearchProvider()));

// Use only Google
ResearchAgent googleAgent = new ResearchAgent(openAiApiKey, List.of(
    new GoogleSearchProvider(googleApiKey, searchEngineId)
));

// Use multiple search providers with equal priority
ResearchAgent multiAgent = new ResearchAgent(openAiApiKey, 
    List.of(new DuckDuckGoSearchProvider(), new GoogleSearchProvider(googleApiKey, searchEngineId))
);

Advanced Features

Self-Assessment

The agent can determine if it already has sufficient knowledge to answer a question:

ResearchResponse response = agent.answerQuestion("What is the capital of France?");
if (!response.getNeededResearch()) {
    System.out.println("Answered without research: " + response.getAnswer());
} else {
    System.out.println("Research was required.");
}

Deep Dive Research

For comprehensive exploration of a topic with follow-up questions:

String deepResult = agent.deepDive("Renewable energy storage solutions", 3);

This will:

  1. Research the initial topic
  2. Generate a follow-up question based on findings
  3. Research that question
  4. Repeat for the specified number of iterations
  5. Synthesize all findings into a comprehensive report

Extension Points

Adding New Search Providers

Implement the SearchProvider interface to add new search engines:

class BingSearchProvider implements SearchProvider {
    @Override
    public List<SearchResult> search(String query) {
        // Implementation
    }
    
    @Override
    public String getName() {
        return "Bing";
    }
}

Content Extraction

Extend the project to extract full content from search results:

private String extractContentFromUrl(String url) {
    try {
        Document doc = Jsoup.connect(url)
                .userAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
                .get();
        
        // Extract main content
        Elements content = doc.select("article, .content, main");
        if (!content.isEmpty()) {
            return content.first().text();
        }
        return doc.body().text();
    } catch (Exception e) {
        return "Failed to extract content: " + e.getMessage();
    }
}

Limitations

  • Rate Limits: Both search providers and AI APIs have rate limits
  • Search Quality: DuckDuckGo scraping may be blocked or return inconsistent results
  • Cost: Google Custom Search and OpenAI APIs have associated costs
  • Timeliness: Information may not be the most recent
  • Complexity: Very nuanced or specialized topics may require human expertise

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgements

  • OpenAI for the GPT API
  • Google for the Custom Search API
  • The Jsoup library for HTML parsing

About

A sophisticated Java-based research agent that leverages multiple search providers and AI to conduct in-depth research on topics, answer questions, and synthesize information from diverse sources.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages