Skip to content

IndexError while formatting references if number of relevant docs is less than TOP_K #3

@jamespalmer2000

Description

@jamespalmer2000

Description

llm_answer.GPTAnswer._format_reference: If the number of relevant docs in relevant_docs_list is less than retriever.EmbeddingRetriever.TOP_K, an IndexError is raised while formatting the references.

https://github.com/Wilson-ZheLin/GPT-4-Web-Browsing/blob/038b74ba3ab76f7e3ba7c1d9f33250120f376735/src/llm_answer.py#L24-L33

To Reproduce

  1. In main.py: Provide a Google search query that yields less than retriever.EmbeddingRetriever.TOP_K total documents.
    • E.g., "CS 224" harvard computer science ext:html inurl:index. Google/Serper returns only one result for this query. When scraped, the document text is less than 1000 characters (the default chunk size).
  2. In serper_service.SerperClient.serper, change serper_settings["page"] to 1.
  3. Run main.py

Expected Behavior

If the number of relevant docs in relevant_docs_list is less than retriever.EmbeddingRetriever.TOP_K, then all of the relevant documents should be used in the formatted reference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions