Hey all - has anyone toyed with using

At a glance

Hey all - has anyone toyed with using custom tokenizers within the TokenCountingHandler? Does the get_llm_token_counts method invoked expect a usage field in the response from a model? Is it possible to implement something that doesn't rely on this?

4 comments

eedhenry

I tested with using something like the SentencePiece tokenizer for LLaMA2: https://github.com/facebookresearch/llama/blob/main/llama/tokenizer.py

I recieve the following error and I can only track it back to the usage field not being returned by the model server: https://gist.github.com/edhenry/d4ed1c1ddc4734737604a1ab515b527e

LLogan M

It should be falling back to just counting tokens if the usage dict is missing

tokenizer is maybe a misleading name -- it just has to be callable function that given a string, returns a list

eedhenry

I should be more specific, apologies! I'm using custom LLMs, tokenizers, and embedding models so I'm trying to follow the logic implemented for other LLMs and am getting a bit lost.

I'm away from my box right now but will try and provide more detail when I'm back at it 😊

eedhenry

Got it. I was returning the raw response from the LLM's chat method in the ChatResponse. Definitely something to keep in mind and/or I'll have a poke at modifying as I will want to return the raw response, with token counting callbacks, but won't have the usage field available - though I suppose I could just add that to my model server API 🤷

Either way, I think I'm back up and running. Now, troubleshooting an issue where prompt token counts far exceed various helpers in the framework that try to trim that down relative to a budget. Thanks @Logan M !

Add a reply

Find answers from the community

Hey all - has anyone toyed with using