Find answers from the community

Updated 2 months ago

Is there a way to assemble existing

Is there a way to assemble existing LlamaIndex components so that I can run PydanticProgramExtractor over only the first few nodes of every document (like TitleExtractor)? I'm trying to categorize the document based on data that's on the first page of every doc. What is the best approach?
L
E
6 comments
you could... run a pydantic program over the first few nodes of a document?
I'm wondering how to accomplish this. TitleExtractor accepts a parameter called nodes but PydanticProgramExtractor doesn't.
they both accept nodes, curious where you saw that they dont πŸ‘€

But in any case, I would just use a pydantic program on its own, more control anyways

Plain Text
from llama_index.core.bridge.pydantic import BaseModel
from llama_index.program.openai import OpenAIPydanticProgram

class Category(BaseModel):
  """A category for a piece of text."""
  name: str


program = OpenAIPydanticProgram.from_defaults(
    output_cls=Category,
    prompt_template_str="Given a piece of text, assign a category.\n\nText:\n{text}",
    verbose=True,
)

category = program.run(text=node.text)
node.metadata['category'] = category.name
Thanks for the response, @Logan M! I was looking at the constructor for TitleExtractor and saw nodes but did not see a way to configure PydanticProgramExtractor to evaluate similarly (at a document-level) and neither in BaseExtractor.
To clarify, the nodes parameter I was talking about from TitleExtractor refers to the number of nodes from the beginning of a document. But I'm seeing what you're saying about program accepting a node.
Ahhh yea I see what you mean now
Add a reply
Sign up and join the conversation on Discord