Find answers from the community

Updated 11 months ago

Running evaluations, but getting

Running evaluations, but getting BadRequestError due to maximum context length is 8192 tokens....

The evals appear to be running against the entire context all at once instead of in chunks.

Specifically:
Plain Text
relevancy_result = judges["relevancy"].evaluate(
    query=example.query,
    response=prediction.response,
    contexts=prediction.contexts,
)


Runs against this example query/response/context combo:
Plain Text
Query tokens: 14
Response tokens: 141
Context tokens: 38395
Remaining tokens: -30358
L
J
8 comments
so for this eval to work properly, it kind if needs to see all source chunks at once (I don't think we've figured out an approach to chunking that evaluation)
I think it's a bug. I'm using gpt-4-1106-preview which accepts 128K tokens, but for some reason the evals are calling llama_index.embeddings.openai.aget_embedding (why?), which is why it's barfing at context >8k
Plain Text
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 400 Bad Request"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 400 Bad Request"
WARNING:llama_index.llms.openai_utils:Retrying llama_index.embeddings.openai.aget_embedding in 0.7456712300709634 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 8825 tokens (8825 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
Retrying llama_index.embeddings.openai.aget_embedding in 0.7456712300709634 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 8825 tokens (8825 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
Attachments
image.png
image.png
Are you sure its the relevancy evalutor that is barfing? Only the semantic similarity evaluator should be calling embeddings
Looking at the source code, the relevancy evaluator makes zero embedding calls
Oh, my fault. I thought the FaithfulnessEvaluator prompt was also calling embeddings. Removing SemanticSimilarityEvaluator removed the error πŸ‘
Same goes for the RelevancyEvaluator
Oh I see the issue. I was following this notebook: https://github.com/run-llama/llama_index/blob/main/docs/examples/llama_dataset/downloading_llama_datasets.ipynb

It's running SemanticSimilarityEvaluator on the contexts instead of the responses:
Plain Text
semantic_similarity_result = judges["semantic_similarity"].evaluate(
    query=example.query,
    response="\n".join(prediction.contexts),
    reference="\n".join(example.reference_contexts),
)


The contexts can be quite large, and really only make sense to run on an context-by-context basis (imo). I'm trying to evaluate responses anyways.

Changing it to this worked as expected:
Plain Text
semantic_similarity_result = await judges["semantic_similarity"].aevaluate(
    query=example.query,
    response=prediction.response,
    reference=example.reference_answer,
)
Add a reply
Sign up and join the conversation on Discord