Token Counting

At a glance

The post contains a Python function called stream that iterates over a response generator, yields messages with an "end" field, and prints the total LLM token count. The comments discuss the implementation of token counting for streaming, with a community member asking for code snippets of the TokenCountingHandler and callback_manager. Another community member shares the code for setting up the token counter and LLM predictor. The discussion then reveals that the community member is using an older version of the LLama Index library, and the other community members suggest updating the library or living with no token counting for streaming.

SSumemerboy

Plain Text

def stream(response_stream):
    # Iterate over the response generator and send each part
    for text in response_stream.response_gen:
        yield {"message": text, "end": False}  # Add an 'end' field with value of False

    print("\n","Total LLM Token Count: ",token_counter.total_llm_token_count,"\n")

    # When finished, yield a final message with 'end' set to True
    yield {"message": "Finished", "end": True}

7 comments

EEmanuel Ferreira

could you share some snippet of the TokenCountingHandler and callback_manager?

SSumemerboy

Plain Text

token_counter = TokenCountingHandler(tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode)
callback_manager = CallbackManager([token_counter])

llm_predictorquery = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k",streaming=True))
service_context_query = ServiceContext.from_defaults(llm_predictor=llm_predictorquery, callback_manager=callback_manager)
set_global_service_context(service_context_query)

LLogan M

What version of llama index do you have? Token counting for streaming was tricky to implement, it was only added recently

SSumemerboy

im using llama_index==0.6.26

LLogan M

Oh boy that's old lol

LLogan M

Not much I can say besides try updating? Or live with no token counting for streaming 😅

SSumemerboy

Alright thanks 👍

Add a reply

Find answers from the community

Token Counting