In a simple scenario such as:

Plain Text

vector_store = FaissVectorStore.from_persist_dir(FLAGS.vector_store_dir)
storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir=FLAGS.vector_store_dir)
index = load_index_from_storage(storage_context=storage_context)
query_engine = index.as_query_engine(similarity_top_k = FLAGS.similarity_top_k)

When response is generated in a way like response = query_engine.query("My question!"), How'd I actually get the whole prompt (containing system message and all parsed texts)?
I thought this was conceptually easy but couldn't figure it out just from codebase...

3 comments

WWhiteFang_Jr

You can get all the nodes which have been used to create the final response from the response object.

print(response.source_nodes)

Now for the prompt: You can either turn verbose=True in your query engine or use observability tool like langfuse or Arize phoenix.

https://docs.llamaindex.ai/en/stable/examples/callbacks/LangfuseCallbackHandler/?h=lang

There is a new Instrumentation module that you can use : https://docs.llamaindex.ai/en/stable/examples/instrumentation/basic_usage/?h=instrum

This is super easy to implement

YYJ50

I guess what I wanted to know was how to print something like this: https://cloud.langfuse.com/project/cltipxbkn0000cdd7sbfbpovm/traces/96e3c191-4c90-49b9-a61b-be55e8477129?observation=11c40e83-868c-4680-a204-5307b3709541
without the need of langfuse. But I guess this isn't really possible without making custom classes etc, as prompt fusing with context with the system prompt is handled from backend without much native observability... I'll try langfuse then

YYJ50

OK Langfuse is pretty easy to self host and extremely informative, just what I wanted. Thanks!!

Add a reply

Find answers from the community

In a simple scenario such as: