----------------------
To retrieve the page number along with the response when querying using LlamaIndex, you can follow the example provided in the extract from the guide on getting references from PDFs. This involves using the page number metadata in the PDF loader along with indexing/query abstractions to use this information effectively. Here's how you can do it:
- Install LlamaIndex if you're using a Colab notebook or a similar environment:
%pip install llama-index-llms-openai
!pip install llama-index
- Load your document and build an index:
```python
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
reader = SimpleDirectoryReader(input_files=["./data/10k/lyft_2021.pdf"])
data = reader.load_data()
index = VectorStoreIndex.from_documents(data)
3. Create a query engine with streaming enabled to get in-line page number citations in the response:
python
query_engine = index.as_query_engine(streaming=True, similarity_top_k=3)
4. Perform your query and stream the response with page citation:
python
response = query_engine.query(
"What was the impact of COVID? Show statements in bullet form and show page reference after each statement."
)
response.print_response_stream()
5. If you want to inspect the source nodes for more detailed information, including the page numbers:
python
for node in response.source_nodes:
print("-----")
text_fmt = node.node.get_content().strip().replace("\n", " ")[:1000]
print(f"Text:\t {text_fmt} ...")
print(f"Metadata:\t {node.node.metadata}")
print(f"Score:\t {node.score:.3f}")
```