asti009asti

Nodes

Hi, I have a couple of questions on nodes' metadata:

is there a way to evaluate how different metadata variables affects the cosine similarity node scores? I was thinking of building a correlation matrix to evaluate this but if there's already something available, I would appreciate a hint.

Can somebody please explain how LLM visible metadata works? Is it sent along with the query and contexts like a contexts = [[node1_text, node1_metadata], [node2_text, node2_metadata], etc ] ? I wonder how LLM decides on the 'weight' of metadata information provided to help improve the answer.

2 comments

Hi, does anybody have an example of

Hi, does anybody have an example of working PydanticProgramExtractor? There might be something obvous missing. I was following an example in the docs for 0.10.15 and am getting a Field required [type=missing, input_value={}, input_type=dict] error.

def run_metadata_pipeline(nodes, node_parser):

    openai_program = OpenAIPydanticProgram.from_defaults(
        output_cls=NodeMetadata,
        prompt_template_str="{input}",
        # extract_template_str=EXTRACT_TEMPLATE_STR
    )

    program_extractor = PydanticProgramExtractor(
        program=openai_program, input_key="input", show_progress=True,metadata_mode=MetadataMode.EMBED, workers=3
    )

    extractor = [
        QuestionsAnsweredExtractor(questions=3, metadata_mode=MetadataMode.EMBED, workers=3),
        SummaryExtractor(summaries=["prev", "self", "next"],workers=3)
        ]       

    pipeline = IngestionPipeline(transformations=[program_extractor])
    return pipeline.run(nodes=nodes, in_place=False, show_progress=True)

here's my NodeMetadata class:

class NodeMetadata(BaseModel):
    """Node metadata."""

    description: str = Field(
        ..., description="A concise one sentence description of what this text chunk useful for."
    )
    terms: List[str] = Field(
      ..., description="a list of keywords used in this text chunk"
    )

The thing is that it works for a while but the error is thrown upon processing of 1-2% of nodes. If you are aware of the bug in this regards, please let me know. Thanks.

7 comments

aasti009asti

Semantic similarity search by metadata

Hi, anyone knows if llamaindex supports metadata matching by semantic similarity? KeywordTableIndex does this by checking stings for equality while in certain cases cosine similarity works better. I would appreciate a couple of ideas. I currently do so by building a separate index from metadata items and do the similarity check as well nodes matching via a custom retriever but the complexity grows quickly if there's more matching items.

1 comment

Find answers from the community

Nodes

Hi, does anybody have an example of

Semantic similarity search by metadata