I think it's a bug. I'm using
gpt-4-1106-preview
which accepts 128K tokens, but for some reason the evals are calling
llama_index.embeddings.openai.aget_embedding
(why?), which is why it's barfing at context >8k
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 400 Bad Request"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 400 Bad Request"
WARNING:llama_index.llms.openai_utils:Retrying llama_index.embeddings.openai.aget_embedding in 0.7456712300709634 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 8825 tokens (8825 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.
Retrying llama_index.embeddings.openai.aget_embedding in 0.7456712300709634 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 8825 tokens (8825 in your prompt; 0 for the completion). Please reduce your prompt; or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.