Find answers from the community

Updated last week

Separate Callback Managers for Parallel API Requests

I have this situation where I need to create a callback_manager per api request.
If I do Settings.callback_manager = callbackmanager when requests A and B are running in parallel kinda, where it starts from A then to B sequential. B would override A's callback_manager right?
If I want to keep each requests token_counter separate, that means I shouldn't use Settings.callback_manager, and directly pass callback_maanger into it's respective engines right?

When I try to pass callback_manager manually into every thing that can take a callback_manager, I usually get incomplete callback traces and token_counters that end up showing as 0. Where as if I just Settings.callback_manager=callback_manager, everything seems to just work.

If I don't have to worry about B overriding A, I would like to keep using settings.callback_manager=callback_manager πŸ˜…

If I do the following manually, I get TypeError: llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever() got multiple values for keyword argument 'callback_manager'
I'm using arize_phoenix

Plain Text
vector_query_engine = base_index.as_query_engine(
    vector_store_kwargs={"qdrant_filters": vector_filters},
    callback_manager=callback_manager,
    node_postprocessors=[rerank],
    similarity_top_k=10,
    use_async=True
)
L
c
9 comments
You'll have to pass it into everything, you probably missed something

For example
Plain Text
llm = OpenAI(..., callback_manager=callback_manager)
embed_model = OpenAIEmbedding(..., callback_manager=callback_manager)
index = VectorStoreIndex(...., embed_model=embed_model, 
callback_manager=callback_manager)
rerank = CohereRerank(..., callback_manager=callback_manager)

# pass in the llm
query_engine = index.as_query_engine(..., llm=llm)
ohh... i see, didn't know you have to pass in callback manager in all the parts.
llama_index.core.indices.vector_store.retrievers.retriever.VectorIndexRetriever() got multiple values for keyword argument 'callback_manager'
do you have any idea why this error would happen.

I'm using Settings.llm and Settings.embedding, but if I need to use callback_manager solo, that means I have to give up using Settings.llm and Settings.embedding possibly?
cuz I'm guessing that's the source of this error.
I think under the hood of as_query_engine its already pulling the callback manager from the index
yeah, that's probably it.
And now that I'm doing everything locally as you said like so, I can see the token counter usage in the Arize phoenix app, but when ever I log the token_counter tokens it returns 0. Do you have any idea why that is? These are my dependencies

Plain Text
llama-index-core==0.10.44
llama-index-llms-azure-openai==0.1.8
llama-index-vector-stores-qdrant==0.1.3
llama-index-embeddings-huggingface==0.1.4
llama-index-callbacks-arize-phoenix==0.1.5



Plain Text
 token_counter = TokenCountingHandler(
        tokenizer=tiktoken.encoding_for_model("gpt-4").encode,
        verbose=True
    )
    callback_manager = CallbackManager([token_counter])
    embedding_model = EmbeddingModel(callback_manager=callback_manager)
    client = qdrant_client.AsyncQdrantClient(
        url=os.getenv('QDRANT_CLOUD_URL'),
        api_key=os.getenv('QDRANT_CLOUD_API_KEY')
    )
    vector_store = QdrantVectorStore(
        aclient=client,
        collection_name=db_config.get('QDRANT_COLLECTION')
    )
    base_index = VectorStoreIndex.from_vector_store(
        vector_store=vector_store,
        callback_manager=callback_manager,
        embed_model=embedding_model
    )
    llm = AzureLLMClient(callback_manager=callback_manager)
providing more context. It seems like I passed in everything correctly. If not arize phoenix wouldn't show all the stack traces and tokens properly. Really mind boggling πŸ˜΅β€πŸ’«
Plain Text
vector_query_engine = base_index.as_query_engine(
    vector_store_kwargs={"qdrant_filters": vector_filters},
    llm=llm,
    node_postprocessors=[rerank],
    similarity_top_k=10,
    use_async=True
)
vector_tool = QueryEngineTool.from_defaults(
                query_engine=vector_query_engine,
)
router_query_engine = RouterQueryEngine(
                llm=llm,
                selector=LLMSingleSelector.from_defaults(llm=llm),
  query_engine_tools=[
      summary_tool,
      vector_tool,
  ]
)
Hmm, not sure on this one, but arize does not use the callback system (it uses the newer intstrumentation system)
Here, rather than using callbacks, you can use the instrumentation system + tags and count tokens yourself

Here's an example with openai
https://colab.research.google.com/drive/1QV01kCEncYZ0Ym6o6reHPcffizSVxsQg?usp=sharing
Interesting thank you let me take a jab at this.
Add a reply
Sign up and join the conversation on Discord