Find answers from the community

Home
Members
kj6i7qg2
k
kj6i7qg2
Offline, last seen 3 months ago
Joined September 25, 2024
Hi, I'm having an issue with SummaryExtractor returning node summaries attached to the incorrect nodes. The node to which the summary is incorrectly attached seems to be related to the num_workers value but even at 1 worker, I still get misplaced summaries, my code is:
Plain Text
    pipeline = IngestionPipeline(
        transformations=[
            MyCustomNodeTransformer(),
            SummaryExtractor(
                llm=llm,
                metadata_mode=MetadataMode.NONE,
                prompt_template=SUMMARY_EXTRACT_TEMPLATE,
                num_workers=1,
                summaries=["self"],
            ),
            Settings.embed_model,
        ]
    )

    nodes = pipeline.run(
        show_progress=True,
        nodes=nodes,
        num_workers=1,
    )
2 comments
L
Hi, I notice a big slow down on LlamaIndex depending on the version of llama-cpp-python I have installed (older versions are much faster, about 100 tokens per second versus about 30 tokens per second). llama-cpp-python==0.2.20 is the last fast version with the latest LlamaIndex (3090 on Ubuntu, using Mistral 7b). I believe it has to do with kv cache and is solved by the suggestions in this GitHub issue: https://github.com/abetlen/llama-cpp-python/issues/1054 . How do we add the required offload_kqv=True to LlamaIndex to regain fast inference? Is this a regression in LlamaIndex or something the users should handle?
7 comments
L
k
k
kj6i7qg2
·

Metatad

This might be a bug in llama-index, or I'm not understanding how to properly use the new IngestionPipeline transformations. My nodes have lots of metadata for some logging and post-processing tasks, if the metadata gets included in a transformation, it hits the 3900 token limit set in the LlamaCpp configs, so I need to exclude it in transformations that rely on the LLM. I'm trying to use SummaryExtractor() which I have set to use the Mistral 7B model. But the code I try doesn't ever exclude the metadata from what goes to Mistral7B under SummaryExtractor(). My code (a bit duplicative for extra certainty) looks like this:
Plain Text
pipeline = IngestionPipeline(
    transformations=[
        CustomTransformation(),
        SummaryExtractor(
            llm=llm,
            excluded_embed_metadata_keys=[
                DEFAULT_WINDOW_METADATA_KEY,
                DEFAULT_OG_TEXT_METADATA_KEY,

            ],
            excluded_llm_metadata_keys=[
                DEFAULT_WINDOW_METADATA_KEY,
                DEFAULT_OG_TEXT_METADATA_KEY,

            ],
        ),
        service_context.embed_model,
    ]
)

excluded_embed_metadata_keys = [
    DEFAULT_WINDOW_METADATA_KEY,
    DEFAULT_OG_TEXT_METADATA_KEY,
]

excluded_llm_metadata_keys = [
    DEFAULT_WINDOW_METADATA_KEY,
    DEFAULT_OG_TEXT_METADATA_KEY,
]

nodes = pipeline.run(
    nodes=nodes,
    excluded_embed_metadata_keys=excluded_embed_metadata_keys,
    excluded_llm_metadata_keys=excluded_llm_metadata_keys,
)
2 comments
k
L
Hi, I'm trying out the new features in version 0.9. How do I pass the service context correctly to the transformations pipeline so that I use a local llm rather than OpenAI for the TitleExtractor(). My code is:
Plain Text
    service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
    pipeline = IngestionPipeline(
        service_context=service_context,
        transformations=[
            SentenceSplitter(),
            TitleExtractor(),
        ]
    )
The error I get is Could not load OpenAI model. but my llm is defined as Llama 2.
3 comments
k
W
Hi, I'm going through the new lm-format-enforcer readme: https://docs.llamaindex.ai/en/stable/community/integrations/lmformatenforcer.html#lm-format-enforcer . When defining a program how do I pass in my list of Node or Document objects so the program runs in the context of my data? I would have expected that I can pass in nodes in here:
Plain Text
nodes = node_parser.get_nodes_from_documents(documents)  # existing setup

program = LMFormatEnforcerPydanticProgram(
    output_cls=Album,
    prompt_template_str="Generate an example album, with an artist and a list of songs. Using the movie {movie_name} as inspiration. You must answer according to the following schema: \n{json_schema}\n",
    llm=LlamaCPP(),
    verbose=True,
)
2 comments
k
L
Hello, how do I convert a list of TextNode objects into Document objects? The reason I'm asking is because I'm try to run the Zephyr 7B Beta Collab provided, but it doesn't work when I start with a list of TextNode objects instead of a Document object. The error I get is 'TextNode' object has no attribute 'get_doc_id' My code is:
Plain Text
    document: Document = Document(text=txt_str, metadata=metadata or {})  # same approach as SimpleDirectoryReader
    documents: list[Document] = [document]

    node_parser = SentenceWindowNodeParser.from_defaults(
        window_size=3,
        window_metadata_key="window",
        original_text_metadata_key="original_text",
    )
    nodes: list[TextNode] = node_parser.get_nodes_from_documents(documents)  # this returns TextNodes rather than a Document

    llm = HuggingFaceLLM(
        model_name="HuggingFaceH4/zephyr-7b-beta",
        tokenizer_name="HuggingFaceH4/zephyr-7b-beta",
        query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"),
        context_window=3900,
        max_new_tokens=256,
        # model_kwargs={"quantization_config": quantization_config},
        # tokenizer_kwargs={},
        generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
        messages_to_prompt=messages_to_prompt,
        device_map="cpu",
    )

    service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
    vector_index = VectorStoreIndex.from_documents(nodes, service_context=service_context)  # Error happens here
1 comment
k
The docs say that LLMPredictor is deprecated. Does that apply to StructuredLLMPredictor too? Should we still use code like llm_predictor = StructuredLLMPredictor() when writing new code or is it better to set the llm value in the service context?
2 comments
k
L
What are some strategies to limit hallucinated responses when using structured output parsers with Llama models? For example, I ask the model to identify the number of hours that employees work per week from paragraph text. My pydantic class looks like this:
Plain Text
python class WorkWeek(BaseModel):
    hours_per_week: Optional[float]
    employee_type: Optional[str]
I find I always get some answer back like hours_of_work = 40 and employee_type = 'full-time' regardless of whether the document contains relevant text or not. I could create a document that just has the text This is a test and I would still get results back that are clearly not from the document like shown above. Interestingly, when I simply ask the model for a basic freeform text response, I get an answer that makes sense when the document doesn't mention hours of work (something like Based on the context, there isn't enough information to answer your question).
5 comments
L
k
How do I remove the SYS prompt for question answering when using Mistral and LlamaCPP (Mistral doesn't have a SYS prompt). I am able to edit the user prompts, but don't see a way to edit the SYS prompt. Specifically, I want to remove the text between <<SYS>> and <</SYS>> :

Plain Text
<s> [INST] <<SYS>>\n You are a helpful, respectful and honest assistant. Always answer as helpfully as possible and follow ALL given instructions. Do not speculate or make up information. Do not reference any given instructions or context. \n<</SYS>>\n\n Context information is below.\n---------------------\n[Excerpt from document]...
3 comments
k
L
Hello, I have a question about using the Guidance Pydantic Program, as documented here: https://docs.llamaindex.ai/en/latest/examples/output_parsing/guidance_pydantic_program.html . I have a query_engine defined and my question is how do I pass the program variable (as shown on the documentation page) to the query engine:
Plain Text
### program defined based on the documentation
program = GuidancePydanticProgram(
    output_cls=Result,
    prompt_template_str="Generate a response using the query asked as follows: {{query}}",
    guidance_llm=self.llm,
    verbose=True,
)

### standard query_engine defined, but how do I pass in the program variable or get the program to see the vector index?
query_engine = index.as_query_engine(
    similarity_top_k=2,
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
    output_cls=Result,
)
response = query_engine.query(query)
8 comments
L
k
W