Find answers from the community

Updated 3 months ago

stupid question - looking at https://

stupid question - looking at https://docs.llamaindex.ai/en/stable/module_guides/observability/callbacks/token_counting_migration.html#token-counting-migration-guide and thinking about token counting.
the callback manager is explicitly using tiktoken, which is counting tokens for openai. but what if i'm not using openai? is it "close enough"?
also, how does the embedding model (eg: BAAI/bge-base-en-v1.5) relate? or does it maybe not relate?
L
t
5 comments
its typically close enough. But you can pass in any function for counting tokens

i.e. AutoTokenizer.from_pretrained("...").encode works too
I guess right now it uses the same tokenizer for both embeddings and LLMs
(embedding tokens usually matter much less, so it hasn't been a priority)
Add a reply
Sign up and join the conversation on Discord