I wish to access the `additional

ss1l3nt

I wish to access the additional_kwargs attribute of CompletionResponse object returned by the complete() method of the LLM integration when this LLM is used in a simple RAG pipeline.
In a RAG pipeline, the response is of type llama_index.core.base.response.schema.Response which only stores the text attribute of CompletionResponse .

Here's an example RAG pipeline that I'm using:

Plain Text

Settings.llm = llm
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("A random question")

What's the best way to get these additional_kwargs?

13 comments

WWhiteFang_Jr

You can use the retriever to retrieve all the nodes and then directly use your llm for final answer using node contexts.

LLogan M

I think the best way is to use instrumentation, and hook into the underlying llm event

LLogan M

Here's an extremely exhaustive example
https://docs.llamaindex.ai/en/stable/examples/instrumentation/instrumentation_observability_rundown/?h=instrumentation

ss1l3nt

is it not possible to add the value stored in additional_kwargs of CompletionResponse to store in https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/base/response/schema.py#L27 ?

LLogan M

Theres multiple layers of functions that would have to bubble up through to make that possible. Not worth it when instrumenation already gives direct access imo

ss1l3nt

i couldn't figure out how and where instrumentation can be implemented to achieve this.
is there any existing example where a LLM integration uses instrumentation to make metadata/kwargs accessible to user when the LLM is used in a RAG (with simple vector store and query engine)?
i suppose many LLM providers would like users to be able to acces log_prob or token_count or other metadata.

LLogan M

I thought the notebook was pretty self explanatory 😅 But heres a trimmed down example

Plain Text

from typing import Dict, List
from treelib import Tree

from llama_index.core.instrumentation.events import BaseEvent
from llama_index.core.instrumentation.event_handlers import BaseEventHandler

from llama_index.core.instrumentation.events.llm import (
    LLMCompletionEndEvent,
    LLMChatEndEvent,
)


class ExampleEventHandler(BaseEventHandler):
    events: List[BaseEvent] = []

    @classmethod
    def class_name(cls) -> str:
        """Class name."""
        return "ExampleEventHandler"

    def handle(self, event: BaseEvent) -> None:
        """Logic for handling event."""
        print("-----------------------")
        # all events have these attributes
        print(event.id_)
        print(event.timestamp)
        print(event.span_id)

        # event specific attributes
        print(f"Event type: {event.class_name()}")
        if isinstance(event, LLMCompletionEndEvent):
            print(event.response)
            print(event.prompt)

        if isinstance(event, LLMChatEndEvent):
            print(event.messages)
            print(event.response)


        self.events.append(event)
        print("-----------------------")


from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.span_handlers import SimpleSpanHandler

# root dispatcher
root_dispatcher = get_dispatcher()

# register span handler
event_handler = ExampleEventHandler()
root_dispatcher.add_event_handler(event_handler)

LLogan M

the event.response will have what you want

ss1l3nt

i'm developing an example notebook to showcase using Cleanlab's LLM (called TLM) in a RAG setting. TLM's integration provides trustworthiness_score which is saved in additional_kwargs of CompletionResponse.
so, my aim is to showcase it to users (developers using TLM in RAG) on how they can access it.

ss1l3nt

like this example notebook achieves what I want because it's dumping the response by Cleanlab's API (which is a dictionary) as string in CompletionResponse and when the query_engine.query() returns a Response object, it decodes the string attribute back into dictionary to get the trustworthiness_score.
https://help.cleanlab.ai/tutorials/tlm_rag/

LLogan M

Using the llm directly or the instrumentation will be the main way to access it right now

ss1l3nt

by using the llm directly, do you mean creating vector store, then index, then retriever objects, and then utilizing the context from retriever nodes to pass it to the LLM? something like this - https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval/

for instrumentation, does it get implemented in the notebook (user side), or in the llm integration (a new class in base.py or another python file)?

ss1l3nt

@Logan M your 2 cents on this last question would help me design UX for this feature, better.

Add a reply

Find answers from the community

I wish to access the `additional_kwargs`