VLLM Configuration Issues in Llama Index

At a glance

The community member noticed that the VLLM (Very Large Language Model) requires the model parameter in the payload, but it is not being sent from the llama-index-llms-vllm. Additionally, the payload to the /chat/completions is being translated to prompt, when it should be messages: [{role:'',content:''},{role:'',content:''}].

In the comments, another community member suggests that the structure of the VLLM server class was likely created that way when it was first added, and that it needs modification. Others agree that this needs to be addressed.

The final comment explains that the reason for this is that the /chat/completions endpoint cannot be used, and the /completions endpoint should be used instead, as the payload is not compatible with the OpenAI format.

ppavanmantha

I noticed couple of things.

vLLM needs the model parameter in the payload but i don't see that is being sent from llama-index-llms-vllm
why the payload to the /chat/completions is being translated to prompt infact it should be messages: [{role:'',content:''},{role:'',content:''}]

5 comments

WWhiteFang_Jr

For second,
I think the structure of the VLLM server class was likely created that way when it was first added.

ppavanmantha

So this needs modification then

WWhiteFang_Jr

Yes definitely

ppavanmantha

the reason for this is you can not use /chat/completions, we should use /completions because the payload is not compatable with openai format

ppavanmantha

i hope i am making it clear

Add a reply

Find answers from the community

VLLM Configuration Issues in Llama Index