Find answers from the community

Updated last year

@Logan M guidance plz!

At a glance

The community member is seeking guidance on how to count tokens used in their prompt and the streaming response when using the llm.stream_complete endpoint. They saw the usage of TokenCountingHandler in an article and want to implement it in their case, but they are not building an index and not using a service_context. The community members discuss different approaches, including attaching the callback manager to the LLM and using TokenCountingHandler and CallbackManager. One community member confirms that the latter approach worked for them.

Useful resources
@Logan M guidance plz!
So I need to count tokens used in my prompt amd it's streaming response. I am using the llm.stream_complete endpoint.
I saw the usage of TokenCountingHandler in this article -> https://docs.llamaindex.ai/en/stable/examples/callbacks/TokenCountingHandler.html
How do I implement it in my case. I am not building any index and not using a service_context. Does service_context make sense for a simple llm.complete() call as well? OR is there any other way to count tokens in this case?
L
r
5 comments
You can attach the callback manager to the llm

Plain Text
llm = OpenAI(..., callback_manager=CallbackManager([token_counter]))
It is printing 0 with the stream_complete() endpoint😔 .
Plain Text
>>> from llama_index.callbacks import TokenCountingHandler, CallbackManager
>>> tk = TokenCountingHandler()
>>> cb = CallbackManager([tk])
>>> from llama_index.llms import OpenAI
>>> llm = OpenAI(callback_manager=cb)
>>> res = llm.complete("Hello!")
>>> tk.total_llm_token_count
11
>>> res = llm.stream_complete("Hello!")
>>> for chunk in res:
...   continue
... 
>>> tk.total_llm_token_count
22
>>> 
worked for me
Add a reply
Sign up and join the conversation on Discord