Find answers from the community

Updated 5 months ago

I am using titleextractor for extracting

At a glance

The community member is using the titleextractor to extract metadata from a document containing 20 different PDFs, but is getting the same title for all the PDFs. A comment suggests that the title extractor only runs on the first 5 nodes and aggregates a single title, and the community member should probably process each PDF individually.

Useful resources

aadeelhasan

I am using titleextractor for extracting metadata by passing a document which contains 20 different pdf but i am getting same title for all the PDFs

1 comment

LLogan M

the title extractor only runs on the first 5 nodes, and the aggregates a single title 🤔

https://github.com/jerryjliu/llama_index/blob/f8c07e8eeb52cc774d9a6334effcbe4c132daef5/llama_index/node_parser/extractors/metadata_extractors.py#L205

You should probably process each PDF individually

Add a reply