Find answers from the community

Updated last year

I'm working with Qdrant and LLamaIndex.

At a glance

The community member is working with Qdrant and LLamaIndex to parse release note documents in HTML format and persist them to a Qdrant Vector DB. The challenge is to retrieve relevant documents based on a date range query, which involves two steps: 1) determining the relevant documents to query, and 2) querying those documents for more context. The community member has tried using the "openai" and "context" modes of the chat_engine, but they do not seem to address this use case. The community member also notes that the vector store does not store document names, which would be helpful since the document names contain their release dates.

In the comments, another community member suggests using the RecursiveRetriever from LLamaIndex to address this use case. They provide an outline of how to implement a custom DateRangeRetriever that can query Qdrant by the date range metadata to get the relevant document IDs, and then use the RecursiveRetriever to retrieve the context for those documents. Another community member also suggests attaching the dates as metadata and using an auto retriever, so that the language model can write metadata filters on the fly.

The original community member thanks the other community members for their suggestions and indicates that

I'm working with Qdrant and LLamaIndex. I have a bunch of release note documents in html format that I was able to parse using UnstructuredElementNodeParser and persist to Qdrant Vector DB via vectorestoreindex.

However, the user can ask for all the changes between two dates in which case I need to retrieve relevant documents in between the date range so the process is two folds: 1) try to determine which relevant documents to query. 2) query those documents for more context.

I'm using chat_engine with modes "openai" and "context" none seem to do this. Also it seems that the vector store doesn't really store document names as that would be helpful since the document names contain their release dates.

Is there a way to do this type of hierarchical search? would I be able to use recursiveretriever for this purpose and if so how would I configure it and work with a chat-engine?
r
L
e
3 comments
Yeah you should be able to use Recursive Reader for this. Here is an outline of how you can use Recursive Reader for this:

Plain Text
from llama_index.indices.vector_store.retrievers import RecursiveRetriever, BaseRetriever
from datetime import datetime

# Custom retriever to filter documents by date range
class DateRangeRetriever(BaseRetriever):
    def retrieve(self, query, params=None):
        # Extract date range from the query
        start_date, end_date = extract_date_range(query)
        # Query Qdrant with the date range metadata to get document IDs
        document_ids = query_qdrant_by_date_range(start_date, end_date)
        return document_ids

# Function to extract date range from the query
def extract_date_range(query):
    # Implement logic to extract date range from the query
    # For example, use regex or natural language processing
    return start_date, end_date

# Function to query Qdrant by date range
def query_qdrant_by_date_range(start_date, end_date):
    # Implement logic to query Qdrant with the date range
    # Return a list of document IDs
    return document_ids

# Set up the RecursiveRetriever
date_range_retriever = DateRangeRetriever()
context_retriever = ... # Set up your retriever for querying documents for context
recursive_retriever = RecursiveRetriever(date_range_retriever, context_retriever)

# Integrate with the chat engine
# You will need to modify or extend the chat engine to use the recursive_retriever
You could also attach the dates as metadata, and use an auto retriever, so that the LLM writes metadata filters on the fly
Thanks @rahul @Logan M I will try this out
Add a reply
Sign up and join the conversation on Discord