Find answers from the community

Updated 4 months ago

Can you help me retrieve the page number along with the response when querying?

At a glance

The community member is asking how to retrieve the page number along with the response when querying. Another community member responds with a detailed example of how to do this using the LlamaIndex library. The steps include installing LlamaIndex, loading a document, building an index, creating a query engine with streaming enabled, performing a query, and inspecting the source nodes to get the page numbers. The approach allows the community member to retrieve responses along with the page numbers from which the information was sourced, making it easier to reference the original documents. The community member is also advised to refer to the original guide on GitHub for more detailed instructions and examples.

Can you help me retrieve the page number along with the response when querying?
k
3 comments
Please stand by @digital_dream64 as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To retrieve the page number along with the response when querying using LlamaIndex, you can follow the example provided in the extract from the guide on getting references from PDFs. This involves using the page number metadata in the PDF loader along with indexing/query abstractions to use this information effectively. Here's how you can do it:

  1. Install LlamaIndex if you're using a Colab notebook or a similar environment:
    Plain Text
     %pip install llama-index-llms-openai
     !pip install llama-index
     
  1. Load your document and build an index:
    ```python
    from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
reader = SimpleDirectoryReader(input_files=["./data/10k/lyft_2021.pdf"])
data = reader.load_data()
index = VectorStoreIndex.from_documents(data)
Plain Text
3. Create a query engine with streaming enabled to get in-line page number citations in the response:
    
python
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)
Plain Text
4. Perform your query and stream the response with page citation:
    
python
response = query_engine.query(
"What was the impact of COVID? Show statements in bullet form and show page reference after each statement."
)
response.print_response_stream()
Plain Text
5. If you want to inspect the source nodes for more detailed information, including the page numbers:
    
python
for node in response.source_nodes:
print("-----")
text_fmt = node.node.get_content().strip().replace("\n", " ")[:1000]
print(f"Text:\t {text_fmt} ...")
print(f"Metadata:\t {node.node.metadata}")
print(f"Score:\t {node.score:.3f}")
```
This approach allows you to retrieve responses along with the page numbers from which the information was sourced, making it easier to reference the original documents. For more detailed instructions and examples, you might want to refer to the original guide on GitHub ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord