Find answers from the community

Updated 3 months ago

Hey all, I'm trying to build out a

Hey all, I'm trying to build out a simple rag QueryPipeline, but when I look at my traces in Phoenix it looks like the llm is being invoked twice?

My code:
pipeline.add_modules(
{
"llm": llm.as_query_component(),
"retriever": retriever,
"formatter": formatter,
"prompt_tmpl": CITATION_RAG_PROMPT.as_query_component(
partial={
"chat_history_str":chat_history_str,
"query_str":query,
"json_prompt_str":json_prompt_str
}
),
"output": output_parser
}
)

pipeline.add_link("retriever", "formatter", dest_key="nodes")
pipeline.add_link("formatter", "prompt_tmpl", dest_key="context_str")
pipeline.add_link("prompt_tmpl", "llm")
pipeline.add_link("llm", "output")

I'm guessing this is because the llm is a chat model, and it is converting my prompt input into a user chat message. Is this actually calling the LLM twice or is it just doing that prompt_to_messages conversion? And if it is calling twice, how can I just get the llm to call complete instead of chat when run through the QueryPipeline?
Attachment
image.png
L
C
a
19 comments
its almost certainly as you guessed. Tbh it should be logged only once, kind of surprised it got logged twice there
What llm are are you using?
Claude 3 Sonnet via Bedrock, just wrapped in your guys' wrapper. I followed it all the way down in the source code but the relevant methods seem to just inherit from the BaseLLM. I'm going to try wrapping the LLM in a CustomQueryComponent and calling chat/complete specifically rather than the as_query_component() on the LLM itself, maybe that will help.
I'm having exactly this same issue
@Logan M are you able to recreate this? Still curious why it’s calling the LLM twice. In one trace it has the raw prompt, in the other the input is wrapped as a chat message with a user role.
It's not calling the llm twice, just logging twice
This is because ll..chat or llm.complete is calling the other under the hood for the llm you are using
Lower priority to fix imo at the moment 😅 so I haven't dug deeper yet
Oh that’s good to know! I can work with that then, thank you!
@Logan M

I've attempted to wrap the LLM in a CustomQueryComponent so that I can invoke specifically either chat() or complete(), but when I do I lose the trace provided natively by the llm object.

I don't fully understand CallbackManagers, but it seems like it should be possible to assign a tag to each component so that it shows up in the trace. Can you provide some link to an example of setting this up?

I am leveraging the documentation found here https://docs.llamaindex.ai/en/stable/examples/pipeline/query_pipeline_memory/#query-pipeline-contruction

Any advice? A code snippet to be able to add a label to a component and have it appear in the Phoenix trace would make my life pretty great
Hmm I don't actually know if arize supports custom events

I'd have to dig in a bit to see how that might work
That's interesting. Even if I need to use one of the supported labels, surely they're obtaining it from the components somehow?
I reached out to the Arize team to see if they have any suggestions. I don't want to deal with the llm object tracing twice :/
Yea you could use instrumentation to emit existing event types essentialy
Completley untested, but it might look something like:
Okay, I found this

https://docs.arize.com/phoenix/tracing/how-to-tracing/manual-instrumentation/custom-spans#configuring-a-tracer

Can I use the LlamaIndex

Plain Text
def initialize_phoenix(endpoint: str = "http://127.0.0.1:6006"):
    # Initialize Phoenix client and set global handler
    # session = px.Client(endpoint=endpoint)  # noqa: F841
    # set_global_handler("arize_phoenix")
    endpoint = "http://127.0.0.1:6006/v1/traces"
    tracer_provider = TracerProvider()
    tracer_provider.add_span_processor(SimpleSpanProcessor(OTLPSpanExporter(endpoint)))
    LlamaIndexInstrumentor().instrument(tracer_provider=tracer_provider, use_legacy_callback_handler=True)


In conjunction with this?
Plain Text
from llama_index.core.instrumentation import get_dispatcher

dispatcher = get_dispatcher(__name__)

from llama_index.core.instrumentation.events.query import (
    QueryStartEvent,
    QueryEndEvent,
)

def my_function(query: str):
  dispatcher.event(QueryStartEvent(query=query)
  ...
  dispatcher.event(QueryEndEvent(query=query, response=response)
  return response
Awesome, I'll take a look. Thank you!
Add a reply
Sign up and join the conversation on Discord