Metatad

This might be a bug in llama-index, or I'm not understanding how to properly use the new IngestionPipeline transformations. My nodes have lots of metadata for some logging and post-processing tasks, if the metadata gets included in a transformation, it hits the 3900 token limit set in the LlamaCpp configs, so I need to exclude it in transformations that rely on the LLM. I'm trying to use SummaryExtractor() which I have set to use the Mistral 7B model. But the code I try doesn't ever exclude the metadata from what goes to Mistral7B under SummaryExtractor(). My code (a bit duplicative for extra certainty) looks like this:

Plain Text

pipeline = IngestionPipeline(
    transformations=[
        CustomTransformation(),
        SummaryExtractor(
            llm=llm,
            excluded_embed_metadata_keys=[
                DEFAULT_WINDOW_METADATA_KEY,
                DEFAULT_OG_TEXT_METADATA_KEY,

            ],
            excluded_llm_metadata_keys=[
                DEFAULT_WINDOW_METADATA_KEY,
                DEFAULT_OG_TEXT_METADATA_KEY,

            ],
        ),
        service_context.embed_model,
    ]
)

excluded_embed_metadata_keys = [
    DEFAULT_WINDOW_METADATA_KEY,
    DEFAULT_OG_TEXT_METADATA_KEY,
]

excluded_llm_metadata_keys = [
    DEFAULT_WINDOW_METADATA_KEY,
    DEFAULT_OG_TEXT_METADATA_KEY,
]

nodes = pipeline.run(
    nodes=nodes,
    excluded_embed_metadata_keys=excluded_embed_metadata_keys,
    excluded_llm_metadata_keys=excluded_llm_metadata_keys,
)

Find answers from the community

Metatad