Find answers from the community

Updated 4 months ago

stupid question - looking at https://

At a glance

stupid question - looking at https://docs.llamaindex.ai/en/stable/module_guides/observability/callbacks/token_counting_migration.html#token-counting-migration-guide and thinking about token counting.
the callback manager is explicitly using tiktoken, which is counting tokens for openai. but what if i'm not using openai? is it "close enough"?
also, how does the embedding model (eg: BAAI/bge-base-en-v1.5) relate? or does it maybe not relate?

5 comments

its typically close enough. But you can pass in any function for counting tokens

i.e. AutoTokenizer.from_pretrained("...").encode works too

👍

I guess right now it uses the same tokenizer for both embeddings and LLMs

(embedding tokens usually matter much less, so it hasn't been a priority)

tyty

Add a reply