Find answers from the community

Updated 4 months ago

VLLM Configuration Issues in Llama Index

At a glance

The community member noticed that the VLLM (Very Large Language Model) requires the model parameter in the payload, but it is not being sent from the llama-index-llms-vllm. Additionally, the payload to the /chat/completions is being translated to prompt, when it should be messages: [{role:'',content:''},{role:'',content:''}].

In the comments, another community member suggests that the structure of the VLLM server class was likely created that way when it was first added, and that it needs modification. Others agree that this needs to be addressed.

The final comment explains that the reason for this is that the /chat/completions endpoint cannot be used, and the /completions endpoint should be used instead, as the payload is not compatible with the OpenAI format.

I noticed couple of things.
  1. vLLM needs the model parameter in the payload but i don't see that is being sent from llama-index-llms-vllm
  2. why the payload to the /chat/completions is being translated to prompt infact it should be messages: [{role:'',content:''},{role:'',content:''}]
W
p
5 comments
For second,
I think the structure of the VLLM server class was likely created that way when it was first added.
So this needs modification then
the reason for this is you can not use /chat/completions, we should use /completions because the payload is not compatable with openai format
i hope i am making it clear
Add a reply
Sign up and join the conversation on Discord