Does title extractor work with version .

At a glance

Does title extractor work with version .8.43

8 comments

It should, What issue are you seeing?

I can't get this to run:

from llama_index.extractors import ( TitleExtractor, QuestionsAnsweredExtractor, ) from llama_index.text_splitter import TokenTextSplitter

LLogan M

the imports?

ttarpus

actually, think I got it working, bro:

ttarpus

# Import the necessary modules for metadata extraction
from llama_index.node_parser.extractors import (
    MetadataExtractor,
    QuestionsAnsweredExtractor,
    TitleExtractor,
)
from llama_index.llms import OpenAI

# Initialize the LLM and metadata extractor
llm = OpenAI(model="gpt-3.5-turbo")

metadata_extractor = MetadataExtractor(
    extractors=[
        TitleExtractor(nodes=5, llm=llm),
        QuestionsAnsweredExtractor(questions=3, llm=llm),
    ],
    in_place=False,
)

# Process nodes to add additional metadata
nodes = metadata_extractor.process_nodes(nodes)
print(f"Debug: Processed {len(nodes)} nodes.")  # Debugging


for node in nodes:
    print(f"Processed Node Metadata: {node.metadata}")

ttarpus

But the output is this:

Extracted Title: Order and Agreement for Energy Efficiency Project: Utility Consumption and Savings Analysis, AT&T Site Information, Billing Details, Installation and Site Term Dates, Supplier's System Equipment and Software, and Subcontractor Information

whereas the actual title is this (in the PDF):

Order for Saved Utility Service

ttarpus

why the discrepency?

LLogan M

The name TitleExtractor is maybe misleading. It just gives the LLM some text and asks it to write an example title

Add a reply

Find answers from the community

Does title extractor work with version .