TokenCounter doesn't count tokens.

At a glance

The community member's post indicates that their TokenCounter is not counting tokens correctly, and they have provided the relevant code. The comments suggest that the issue may be related to the custom CustomContext class, which inherits from ContextChatEngine and overwrites a method. Another community member suggests passing the llm from the service_context to the CustomContext instance, but this does not solve the issue. Eventually, the community member finds that adding the service_context to the CustomContext.from_defaults call resolves the problem.

SSeaCat

TokenCounter doesn't count tokens.
At some moment, it stopped working. Here is the code:

Plain Text

token_counter = TokenCountingHandler(
            tokenizer=tiktoken.encoding_for_model(model_name).encode,
            verbose=False  
        )
callback_manager = CallbackManager([token_counter])
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size=project_chunk_size, 
                                                        callback_manager=callback_manager)
index = VectorStoreIndex.from_vector_store(vector_store, service_context)
retriever = index.as_retriever(verbose=True, chat_mode="context", similarity_top_k=similarity_top_k)
custom_chat_engine = CustomContext.from_defaults(
                                    retriever=retriever,
                                    memory=chatmemory, 
                                    context_template=generate_context_template(),
                                    system_prompt=prepared_system_prompt,
                                    node_postprocessors=[CustomPostprocessor(
                                            context_limit, query_text + prepared_system_prompt, project.db_name, None)])
response = custom_chat_engine.chat(query_text, chat_history=chat_history)
tokens_used = token_counter.total_llm_token_count # <----- ALWAYS ZERO

Thanks!

9 comments

LLogan M

seems kinda sus

Seems like CustomContext is your own class? Is it using the service context?

SSeaCat

Yeah, it's inherits from ContextChatEngine and rewrites just one method. Can it be the reason?

Plain Text

class CustomContext(ContextChatEngine):
    def _get_prefix_messages_with_context(self, context_str: str) -> List[ChatMessage]:
        """Get the prefix messages with context."""
        # ensure we grab the user-configured system prompt
        system_prompt = ""
        prefix_messages = self._prefix_messages
        if (
            len(self._prefix_messages) != 0
            and self._prefix_messages[0].role == MessageRole.SYSTEM
        ):
            system_prompt = str(self._prefix_messages[0].content)
            prefix_messages = self._prefix_messages[1:]

        context_str_w_sys_prompt = system_prompt.strip() + context_str # Opporsite order
        return [
            ChatMessage(content=context_str_w_sys_prompt, role=MessageRole.SYSTEM),
            *prefix_messages,
        ]

LLogan M

ah I see, for chat engine then, pass the LLM into it

Plain Text

custom_chat_engine = CustomContext.from_defaults(llm=service_context.llm, ...)

SSeaCat

thanks, let me try and see if it helps!

SSeaCat

No, it didn't help. I see in the ContextChatEngine class, it's already obtained from the service context (context.py, line 75):

Plain Text

llm = service_context.llm_predictor.llm

LLogan M

Hmm. I tried my own code and it works fine

Plain Text

from llama_index.callbacks import CallbackManager, TokenCountingHandler
from llama_index.chat_engine import ContextChatEngine
from llama_index.llms import OpenAI
from llama_index import Document, ServiceContext, VectorStoreIndex
import tiktoken

token_counter = TokenCountingHandler(
    tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode, verbose=False
)
callback_manager = CallbackManager([token_counter])
service_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo"), chunk_size=512, callback_manager=callback_manager
)

index = VectorStoreIndex.from_documents(
    [Document.example()], service_context=service_context
)

chat_engine = index.as_chat_engine(
    verbose=True, chat_mode="context", similarity_top_k=2
)

response = chat_engine.chat("Tell me something about LLMs")

print(token_counter.total_llm_token_count)

SSeaCat

What should I check to figure it out? Maybe debug inside and see how it's calculating the token? But I'm not sure where to look at...

LLogan M

add the service context here

Plain Text

custom_chat_engine = CustomContext.from_defaults(
                                    retriever=retriever,
                                    memory=chatmemory, 
                                    context_template=generate_context_template(),
                                    system_prompt=prepared_system_prompt,
                                    service_context=service_context,
                                    node_postprocessors=[CustomPostprocessor(
                                            context_limit, query_text + prepared_system_prompt, project.db_name, None)])

SSeaCat

It worked, yay! 🙂

Add a reply

Find answers from the community

TokenCounter doesn't count tokens.