Find answers from the community

Updated 11 months ago

Nodes

At a glance

The community member is trying to get a document reference and page number reference when querying multiple documents, but is having issues. They are using OpenAIEmbedding, OpenAI, and VectorStoreIndex from the llama-index library. The community members discuss that the metadata should contain both the filename and page number, and that the MarkdownElementNodeParser may be causing issues with losing the page number information. One community member suggests that this issue may have been fixed in newer versions of the llama-index-core library, but the community member who upgraded is still experiencing issues with the page number being incorrect or missing.

Hi All, I am wanting a document reference AND page number reference when I get an answer from a query. It works with 1 document but not when I ingest multiple documents. Not using anything fancy.
embed_model=OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI(model="gpt-3.5-turbo-0125")
documents = SimpleDirectoryReader(datapath, file_extractor=file_extractor).load_data()
index_llamaparse = VectorStoreIndex.from_documents(documents, embed_model=embed_model)
L
t
5 comments
If you want both page and filename, both of those should be in the metadata before you index.

Then. You can access ok the response object

response.source_nodes [0].node.metadata
ok Thanks Logan,
I see at the moment all I get is the path from response.source_nodes [0].node.metadata . Any pointers on how to add both both page and filename to the meatdata would be appreciated. I assume I do it when I parse the nodes:

node_parser = MarkdownElementNodeParser(llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=1)
node_parser.get_nodes_from_documents([docs[0]])
Hi Logan,
I see the metadata now, the problem was the node parser: MarkdownElementNodeParser. The pagelabel data was lost when using that node parser.
Thanks
I think this was fixed on newer versions of llama-index-core
Hi Logan,
I upgraded but the page number is still missing when I use MarkdownElementNodeParser. I have also tried multiple strategies to get the page number for a docx file. The metadata is not there and in the query response the page number is always wrong. e.g. it returns page 58 when the info is on page 30.
Matt
Add a reply
Sign up and join the conversation on Discord