Find answers from the community

Updated 11 months ago

import tiktoken

At a glance

The community member is using the TokenCountingHandler from the llama_index library to track the token usage of the OpenAI language model. They are wondering if they can use the same token counting handler with the BAAI/bge-small model instead of OpenAI.

In the comments, another community member suggests changing the tokenizer to one from Hugging Face, and provides an example of how to load the tokenizer for the BAAI/bge-small-en-v1.5 model. However, there is no explicitly marked answer on how to use the token counting handler with the BAAI/bge-small model.

import tiktoken
from llama_index.core.callbacks import CallbackManager, TokenCountingHandler
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings


token_counter = TokenCountingHandler(
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)

Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2)
Settings.callback_manager = CallbackManager([token_counter])
Here, can we use this tokencountinghandler with BAAI/bge small model instead of open ai? If so how?
L
2 comments
Change the tokenizer to something from huggingface
Plain Text
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-small-en-v1.5")
Add a reply
Sign up and join the conversation on Discord