The community member is trying to get a document reference and page number reference when querying multiple documents, but is having issues. They are using OpenAIEmbedding, OpenAI, and VectorStoreIndex from the llama-index library. The community members discuss that the metadata should contain both the filename and page number, and that the MarkdownElementNodeParser may be causing issues with losing the page number information. One community member suggests that this issue may have been fixed in newer versions of the llama-index-core library, but the community member who upgraded is still experiencing issues with the page number being incorrect or missing.
Hi All, I am wanting a document reference AND page number reference when I get an answer from a query. It works with 1 document but not when I ingest multiple documents. Not using anything fancy. embed_model=OpenAIEmbedding(model="text-embedding-3-large") llm = OpenAI(model="gpt-3.5-turbo-0125") documents = SimpleDirectoryReader(datapath, file_extractor=file_extractor).load_data() index_llamaparse = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
ok Thanks Logan, I see at the moment all I get is the path from response.source_nodes [0].node.metadata . Any pointers on how to add both both page and filename to the meatdata would be appreciated. I assume I do it when I parse the nodes:
Hi Logan, I see the metadata now, the problem was the node parser: MarkdownElementNodeParser. The pagelabel data was lost when using that node parser. Thanks
Hi Logan, I upgraded but the page number is still missing when I use MarkdownElementNodeParser. I have also tried multiple strategies to get the page number for a docx file. The metadata is not there and in the query response the page number is always wrong. e.g. it returns page 58 when the info is on page 30. Matt