Find answers from the community

Updated 11 months ago

I am having issue with the compact and

I am having issue with the compact and refine prompt (the default) for the Faithfullness evaluator. It should be squishing my context into the context window and refining but it keeps throwing an error saying that it is breaching the context window. Is this a bug?

Plain Text
huggingface_hub.inference._text_generation.ValidationError: Input validation error: `inputs` tokens + `max_new_tokens` must be <= 4096. Given: 3896 `inputs` tokens and 256 `max_new_tokens`
make: *** [evals] Error 1
L
W
12 comments
ah, likely a small bug with token counting (the default tokenizer is using a gpt-3.5 tokenizer)
@Logan M

Ahhh ok. Yeah I am using this model on the Inference Endpoints:

TheBloke/prometheus-13B-v1.0-AWQ

Is the tokenizer configurable?
it is! And I'm actually just fixing the example lol give me one sec
This is kinda janky, but should work (I need to make a PR to do the partial under the hood I think lol)

Plain Text
from functools import partial
from transformers import AutoTokenizer
from llama_index import set_global_tokenizer

set_global_tokenizer(
    partial(AutoTokenizer.from_pretrained("TheBloke/prometheus-13B-v1.0-AWQ").encode, add_special_tokens=False)
)
ahhh sweet. I'll give it a go now.
I would love to check out the PR when its ready to btw so I can see what was the fix was
basically an isinstance check to check for the huggingface tokenizer base class lol
then doing the partial for you under the hood
(its part of a much larger PR to deprecate the entire serivce context and replace it with a better Settings object lol -- v0.10.0 coming soon!)
sounds like a lot of work!
Thanks for the quick fix man!
Add a reply
Sign up and join the conversation on Discord