Hey all, I'm trying to build out a

At a glance

A community member is building a rag QueryPipeline and notices that the LLM is being invoked twice in the Phoenix traces. They suspect this is because the LLM is a chat model and is converting the prompt input into a user chat message. The community members discuss this issue, with some suggesting it's just a logging issue and not an actual double invocation of the LLM. They also discuss potential solutions, such as wrapping the LLM in a CustomQueryComponent to call chat() or complete() directly, and using instrumentation to add custom labels to the components in the trace. However, there is no definitive answer provided in the comments.

Useful resources

CChill

Hey all, I'm trying to build out a simple rag QueryPipeline, but when I look at my traces in Phoenix it looks like the llm is being invoked twice?

My code:
pipeline.add_modules(
{
"llm": llm.as_query_component(),
"retriever": retriever,
"formatter": formatter,
"prompt_tmpl": CITATION_RAG_PROMPT.as_query_component(
partial={
"chat_history_str":chat_history_str,
"query_str":query,
"json_prompt_str":json_prompt_str
}
),
"output": output_parser
}
)

pipeline.add_link("retriever", "formatter", dest_key="nodes")
pipeline.add_link("formatter", "prompt_tmpl", dest_key="context_str")
pipeline.add_link("prompt_tmpl", "llm")
pipeline.add_link("llm", "output")

I'm guessing this is because the llm is a chat model, and it is converting my prompt input into a user chat message. Is this actually calling the LLM twice or is it just doing that prompt_to_messages conversion? And if it is calling twice, how can I just get the llm to call complete instead of chat when run through the QueryPipeline?

Attachment

19 comments

LLogan M

its almost certainly as you guessed. Tbh it should be logged only once, kind of surprised it got logged twice there

LLogan M

What llm are are you using?

CChill

Claude 3 Sonnet via Bedrock, just wrapped in your guys' wrapper. I followed it all the way down in the source code but the relevant methods seem to just inherit from the BaseLLM. I'm going to try wrapping the LLM in a CustomQueryComponent and calling chat/complete specifically rather than the as_query_component() on the LLM itself, maybe that will help.

aalfredmadere

I'm having exactly this same issue

CChill

@Logan M are you able to recreate this? Still curious why it’s calling the LLM twice. In one trace it has the raw prompt, in the other the input is wrapped as a chat message with a user role.

LLogan M

It's not calling the llm twice, just logging twice

LLogan M

This is because ll..chat or llm.complete is calling the other under the hood for the llm you are using

LLogan M

Lower priority to fix imo at the moment 😅 so I haven't dug deeper yet

CChill

Oh that’s good to know! I can work with that then, thank you!

CChill

@Logan M

I've attempted to wrap the LLM in a CustomQueryComponent so that I can invoke specifically either chat() or complete(), but when I do I lose the trace provided natively by the llm object.

I don't fully understand CallbackManagers, but it seems like it should be possible to assign a tag to each component so that it shows up in the trace. Can you provide some link to an example of setting this up?

I am leveraging the documentation found here https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline_memory/#query-pipeline-contruction

Any advice? A code snippet to be able to add a label to a component and have it appear in the Phoenix trace would make my life pretty great

LLogan M

Hmm I don't actually know if arize supports custom events

I'd have to dig in a bit to see how that might work

CChill

That's interesting. Even if I need to use one of the supported labels, surely they're obtaining it from the components somehow?

CChill

I reached out to the Arize team to see if they have any suggestions. I don't want to deal with the llm object tracing twice :/

LLogan M

Yea you could use instrumentation to emit existing event types essentialy

LLogan M

Completley untested, but it might look something like:

CChill

Okay, I found this

https://docs.arize.com/phoenix/tracing/how-to-tracing/manual-instrumentation/custom-spans#configuring-a-tracer

Can I use the LlamaIndex

Plain Text

def initialize_phoenix(endpoint: str = "http://127.0.0.1:6006"):
    # Initialize Phoenix client and set global handler
    # session = px.Client(endpoint=endpoint)  # noqa: F841
    # set_global_handler("arize_phoenix")
    endpoint = "http://127.0.0.1:6006/v1/traces"
    tracer_provider = TracerProvider()
    tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))
    LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider, use_legacy_callback_handler=True)

In conjunction with this?

LLogan M

Plain Text

from llama_index.core.instrumentation import get_dispatcher

dispatcher = get_dispatcher(__name__)

from llama_index.core.instrumentation.events.query import (
    QueryStartEvent,
    QueryEndEvent,
)

def my_function(query: str):
  dispatcher.event(QueryStartEvent(query=query)
  ...
  dispatcher.event(QueryEndEvent(query=query, response=response)
  return response

LLogan M

You can read more about instrumentation over here
https://docs.llamaindex.ai/en/latest/module_guides/observability/instrumentation/?h=instr

And api reference here
https://docs.llamaindex.ai/en/latest/api_reference/instrumentation/

CChill

Awesome, I'll take a look. Thank you!

Add a reply

Find answers from the community

Hey all, I'm trying to build out a