Separate Callback Managers for Parallel API Requests

Question

I have this situation where I need to create a callback_manager per api request.If I do Settings.callback_manager = callbackmanager when requests A and B are running in parallel kinda, where it starts from A then to B sequential. B would override A's callback_manager right? If I want to keep each requests token_counter separate, that means I shouldn't use Settings.callback_manager, and directly pass callback_maanger into it's respective engines right?When I try to pass callback_manager manually into every thing that can take a callback_manager, I usually get incomplete callback traces and token_counters that end up showing as 0. Where as if I just Settings.callback_manager=callback_manager, everything seems to just work.If I don't have to worry about B overriding A, I would like to keep using settings.callback_manager=callback_manager 😅If I do the following manually, I get TypeError: llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever() got multiple values for keyword argument 'callback_manager'I'm using arize_phoenixvector_query_engine = base_index.as_query_engine( vector_store_kwargs={"qdrant_filters": vector_filters}, callback_manager=callback_manager, node_postprocessors=[rerank], similarity_top_k=10, use_async=True
)

Logan M · Answer

You'll have to pass it into everything, you probably missed somethingFor examplellm = OpenAI(..., callback_manager=callback_manager)
embed_model = OpenAIEmbedding(..., callback_manager=callback_manager)
index = VectorStoreIndex(...., embed_model=embed_model, callback_manager=callback_manager)
rerank = CohereRerank(..., callback_manager=callback_manager) # pass in the llm
query_engine = index.as_query_engine(..., llm=llm)

chantlong · Answer

ohh... i see, didn't know you have to pass in callback manager in all the parts.

llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever() got multiple values for keyword argument 'callback_manager'

do you have any idea why this error would happen.

I'm using Settings.llm and Settings.embedding, but if I need to use callback_manager solo, that means I have to give up using Settings.llm and Settings.embedding possibly?

chantlong · Answer

cuz I'm guessing that's the source of this error.

Logan M · Answer

I think under the hood of as_query_engine its already pulling the callback manager from the index

chantlong · Answer

yeah, that's probably it.And now that I'm doing everything locally as you said like so, I can see the token counter usage in the Arize phoenix app, but when ever I log the token_counter tokens it returns 0. Do you have any idea why that is? These are my dependencies llama-index-core==0.10.44
llama-index-llms-azure-openai==0.1.8
llama-index-vector-stores-qdrant==0.1.3
llama-index-embeddings-huggingface==0.1.4
llama-index-callbacks-arize-phoenix==0.1.5 token_counter = TokenCountingHandler( tokenizer=tiktoken.encoding_for_model("gpt-4").encode, verbose=True ) callback_manager = CallbackManager([token_counter]) embedding_model = EmbeddingModel(callback_manager=callback_manager) client = qdrant_client.AsyncQdrantClient( url=os.getenv('QDRANT_CLOUD_URL'), api_key=os.getenv('QDRANT_CLOUD_API_KEY') ) vector_store = QdrantVectorStore( aclient=client, collection_name=db_config.get('QDRANT_COLLECTION') ) base_index = VectorStoreIndex.from_vector_store( vector_store=vector_store, callback_manager=callback_manager, embed_model=embedding_model ) llm = AzureLLMClient(callback_manager=callback_manager)

chantlong · Answer

providing more context. It seems like I passed in everything correctly. If not arize phoenix wouldn't show all the stack traces and tokens properly. Really mind boggling 😵‍💫 vector_query_engine = base_index.as_query_engine( vector_store_kwargs={"qdrant_filters": vector_filters}, llm=llm, node_postprocessors=[rerank], similarity_top_k=10, use_async=True
)
vector_tool = QueryEngineTool.from_defaults( query_engine=vector_query_engine,
)
router_query_engine = RouterQueryEngine( llm=llm, selector=LLMSingleSelector.from_defaults(llm=llm), query_engine_tools=[ summary_tool, vector_tool, ]
)

Logan M · Answer

Hmm, not sure on this one, but arize does not use the callback system (it uses the newer intstrumentation system)

Logan M · Answer

Here, rather than using callbacks, you can use the instrumentation system + tags and count tokens yourself

Here's an example with openai
https://colab.research.google.com/drive/1QV01kCEncYZ0Ym6o6reHPcffizSVxsQg?usp=sharing

chantlong · Answer

Interesting thank you let me take a jab at this.

Find answers from the community

Separate Callback Managers for Parallel API Requests