Question on TokenCountingHandler:

At a glance

Question on TokenCountingHandler:

I am using this tutorial to create a hybrid retriever with re-ranking https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_retriever.html#advanced-hybrid-retriever-re-ranking
I am trying to count tokens via

Plain Text

token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
callback_manager = CallbackManager([token_counter])
Settings.callback_manager = CallbackManager([token_counter])

I am able to count embedding tokens but not completion tokens. Is this because of using a custom retrieval?

6 comments

LLogan M

Hmmm, it might depend on how you setup the LLM. I think theres a small bug with how the global callback manager interacts with the LLM in some cases

BBioHacker

This is my llm
llm = OpenAI(temperature=0.0, model='gpt-3.5-turbo-0125', max_tokens=4000)
and this is my query engine

Plain Text

slides_query_engine = RetrieverQueryEngine.from_args(
    retriever=slides_hybrid_retriever,
    node_postprocessors=[cohere_rerank],
    llm=llm,
    #callback_manager = callback_manager,
    embed_model=embed_model,
)

BBioHacker

I would be happy to post/comment on gihub

LLogan M

try doing either

Settings.llm =llm and not passing it into the query engine

Or doing llm = OpenAI(...., callback_manager=callback_manager)

LLogan M

I think the BaseLLM class needs to pull the callback manager from the global settings by default to make it work in this case

BBioHacker

Awesome thank you it seems to be working now

Add a reply

Find answers from the community

Question on TokenCountingHandler: