Nodes

At a glance

The community member is trying to get a document reference and page number reference when querying multiple documents, but is having issues. They are using OpenAIEmbedding, OpenAI, and VectorStoreIndex from the llama-index library. The community members discuss that the metadata should contain both the filename and page number, and that the MarkdownElementNodeParser may be causing issues with losing the page number information. One community member suggests that this issue may have been fixed in newer versions of the llama-index-core library, but the community member who upgraded is still experiencing issues with the page number being incorrect or missing.

tthe_vexed_viper

Hi All, I am wanting a document reference AND page number reference when I get an answer from a query. It works with 1 document but not when I ingest multiple documents. Not using anything fancy.
embed_model=OpenAIEmbedding(model="text-embedding-3-large")
llm = OpenAI(model="gpt-3.5-turbo-0125")
documents = SimpleDirectoryReader(datapath, file_extractor=file_extractor).load_data()
index_llamaparse = VectorStoreIndex.from_documents(documents, embed_model=embed_model)

5 comments

LLogan M

If you want both page and filename, both of those should be in the metadata before you index.

Then. You can access ok the response object

response.source_nodes [0].node.metadata

tthe_vexed_viper

ok Thanks Logan,
I see at the moment all I get is the path from response.source_nodes [0].node.metadata . Any pointers on how to add both both page and filename to the meatdata would be appreciated. I assume I do it when I parse the nodes:

node_parser = MarkdownElementNodeParser(llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=1)
node_parser.get_nodes_from_documents([docs[0]])

tthe_vexed_viper

Hi Logan,
I see the metadata now, the problem was the node parser: MarkdownElementNodeParser. The pagelabel data was lost when using that node parser.
Thanks

LLogan M

I think this was fixed on newer versions of llama-index-core

tthe_vexed_viper

Hi Logan,
I upgraded but the page number is still missing when I use MarkdownElementNodeParser. I have also tried multiple strategies to get the page number for a docx file. The metadata is not there and in the query response the page number is always wrong. e.g. it returns page 58 when the info is on page 30.
Matt

Add a reply

Find answers from the community

Nodes