Find answers from the community

Updated 3 months ago

When I run a PDF through a pipeline with

When I run a PDF through a pipeline with a sentence_splitter, it loses the correct page_label, as in, page_label display as the last page of PDF. Any solutions/ideas?
W
j
3 comments
Page label should get added as a metadata and should be added to all the created nodes.

Once nodes are created can you verify if page label is added to all the nodes or not
{"page_label": "369", "file_name": "LuaProgrammingGems.pdf", "file_path": "_8bot/resources/tester1/files/LuaProgrammingGems.pdf", "file_type": "application/pdf", "file_size": 1931210, "creation_date": "2024-05-23", "last_modified_date": "2024-05-23", "user": "tester1", "_nodecontent": "{"id": "5c5fed52-c798-4f7f-b7da-a5737ce6afac", "embedding": null, "metadata": {"page_label": "369", "file_name": "LuaProgrammingGems.pdf", "file_path": "_8bot/resources/tester1/files/LuaProgrammingGems.pdf", "file_type": "application/pdf", "file_size": 1931210, "creation_date": "2024-05-23", "last_modified_date": "2024-05-23", "user": "tester1"}, "excluded_embed_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "excluded_llm_metadata_keys": ["file_name", "file_type", "file_size", "creation_date", "last_modified_date", "last_accessed_date"], "relationships": {"1": {"node_id": "LuaProgrammingGems", "node_type": "4", "metadata": {"page_label": "19", "file_name": "LuaProgrammingGems.pdf", "file_path": "_8bot/resources/tester1/files/LuaProgrammingGems.pdf", "file_type": "application/pdf", "file_size": 1931210, "creation_date": "2024-05-23", "last_modified_date": "2024-05-23", "user": "tester1"}, "hash":
The node does seem to have the correct page_label
Add a reply
Sign up and join the conversation on Discord