Find answers from the community

Updated 11 months ago

Question on TokenCountingHandler:

At a glance
Question on TokenCountingHandler:

I am using this tutorial to create a hybrid retriever with re-ranking https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_retriever.html#advanced-hybrid-retriever-re-ranking
I am trying to count tokens via
Plain Text
token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
callback_manager = CallbackManager([token_counter])
Settings.callback_manager = CallbackManager([token_counter])

I am able to count embedding tokens but not completion tokens. Is this because of using a custom retrieval?
L
B
6 comments
Hmmm, it might depend on how you setup the LLM. I think theres a small bug with how the global callback manager interacts with the LLM in some cases
This is my llm
llm = OpenAI(temperature=0.0, model='gpt-3.5-turbo-0125', max_tokens=4000)
and this is my query engine

Plain Text
slides_query_engine = RetrieverQueryEngine.from_args(
    retriever=slides_hybrid_retriever,
    node_postprocessors=[cohere_rerank],
    llm=llm,
    #callback_manager = callback_manager,
    embed_model=embed_model,
)
I would be happy to post/comment on gihub
try doing either

Settings.llm =llm and not passing it into the query engine

Or doing llm = OpenAI(...., callback_manager=callback_manager)
I think the BaseLLM class needs to pull the callback manager from the global settings by default to make it work in this case
Awesome thank you it seems to be working now
Add a reply
Sign up and join the conversation on Discord