Find answers from the community

Updated 2 months ago

Metatad

This might be a bug in llama-index, or I'm not understanding how to properly use the new IngestionPipeline transformations. My nodes have lots of metadata for some logging and post-processing tasks, if the metadata gets included in a transformation, it hits the 3900 token limit set in the LlamaCpp configs, so I need to exclude it in transformations that rely on the LLM. I'm trying to use SummaryExtractor() which I have set to use the Mistral 7B model. But the code I try doesn't ever exclude the metadata from what goes to Mistral7B under SummaryExtractor(). My code (a bit duplicative for extra certainty) looks like this:
Plain Text
pipeline = IngestionPipeline(
    transformations=[
        CustomTransformation(),
        SummaryExtractor(
            llm=llm,
            excluded_embed_metadata_keys=[
                DEFAULT_WINDOW_METADATA_KEY,
                DEFAULT_OG_TEXT_METADATA_KEY,

            ],
            excluded_llm_metadata_keys=[
                DEFAULT_WINDOW_METADATA_KEY,
                DEFAULT_OG_TEXT_METADATA_KEY,

            ],
        ),
        service_context.embed_model,
    ]
)

excluded_embed_metadata_keys = [
    DEFAULT_WINDOW_METADATA_KEY,
    DEFAULT_OG_TEXT_METADATA_KEY,
]

excluded_llm_metadata_keys = [
    DEFAULT_WINDOW_METADATA_KEY,
    DEFAULT_OG_TEXT_METADATA_KEY,
]

nodes = pipeline.run(
    nodes=nodes,
    excluded_embed_metadata_keys=excluded_embed_metadata_keys,
    excluded_llm_metadata_keys=excluded_llm_metadata_keys,
)
L
k
2 comments
I think the incoming nodes to the summary extractor need to have their metatdata excluded fields already set

Then you can set the Metadata mode on the summary extractor (I.e. all, none, LLM, or embed)

Probably just easiest to set

Plain Text
from llama_index.schema import MetadataMode

....
SummaryExtractor(metadata_mode=MetadataMode.NONE),
...
Thank you @Logan M , that did the trick!
Add a reply
Sign up and join the conversation on Discord