Find answers from the community

Updated 2 months ago

Openailike class errors with latest vllm updates

Hello,
I'm having issues with the OpenAILike class following the latest updates of vLLM (0.6.X).
I'm getting theses types of errors when I call my chat_engine, no matter the model.
Plain Text
BadRequestError: Error code: 400 - {'object': 'error', 'message': 'This model only supports single tool-calls at once!', 'type': 'BadRequestError', 'param': None, 'code': 400}

I don't have the issues with the versions 0.5.X any idea why ?
L
T
33 comments
I have no idea -- I haven't been keeping up with vllm updates πŸ‘€

I'm not even sure what would cause this error... might have to inspect what kwargs/options OpenAILike is sending to vllm?
What chat engine are you using? Whats your llm setup?
Plain Text
from llama_index.core.chat_engine import SimpleChatEngine
from openai import OpenAI
from llama_index.llms.openai_like import OpenAILike


openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

openai_client = OpenAI(
            api_key=openai_api_key,
            base_url=openai_api_base,
        )
models = openai_client.models.list()
selected_model = models.data[0].id

llm = OpenAILike(
                api_key=openai_api_key,
                api_base=openai_api_base,
                model=selected_model,
                max_tokens=2500,
                is_chat_model=True,
            )

chat_engine = SimpleChatEngine.from_defaults(llm=llm)

chat_engine.chat("Hello")
vllm 0.6.2
llama-index-core 0.11.16
llama-index-llms-openai-like 0.2.0

for the versions
the first request always pass, it's the second one that get the error message for some reason
From simple chat engine πŸ˜… oh boy
you can test the llm directly in this case to remove some ambiguity

Plain Text
from llama_index.core.llms import ChatMessage

llm.chat([ChatMessage(role="user", content="Hello!")])


You can see exactly what options openai-like is sending along to your LLM, by checking print(llm._get_model_kwargs())

You may or may not also need to update (or downgrade! I have no idea with vllm) openai and llama-index-llms-openai packages
Probably best to look at the changes vllm has been making lately too... maybe theres some option that needs to be set for the model you are using
Okay thanks, I'll look further into that and I'll let you know if I find anything
with vLLM 0.6.2 this works, but not while using the simple chat engine
Plain Text
from llama_index.core.llms import ChatMessage

llm.chat([ChatMessage(role="user", content="Hello!"),
          ChatMessage(role="assistant", content="Hello!"),
          ChatMessage(role="user", content="Hello!")])
I don't think simple chat engine is doing anything different... lemme check
Plain Text
all_messages = self._prefix_messages + self._memory.get()

chat_response = self._llm.chat(all_messages)
That should be the exact same thing πŸ˜…
prefix messages will be empty for you, since you aren't passing it in (or a system prompt)
So to replicate entirely would be

Plain Text
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.llms import ChatMessage

llm = ...
memory = ChatMemoryBuffer.from_defaults(llm=llm)
memory.put(ChatMessage(role="user", content="Hello!"))

all_messages = memory.get()
resp = llm.chat(all_messages)
memory.put(resp.message)

memory.put(ChatMessage(role="user", content="Hello!"))
all_messages = memory.get()
resp = llm.chat(all_messages)
memory.put(resp.message)
...
I'm getting this error with mistral
Plain Text
BadRequestError: Error code: 400 - {'object': 'error', 'message': 'After the optional system message, conversation roles must alternate user/assistant/user/assistant/...', 'type': 'BadRequestError', 'param': None, 'code': 400}

and in the logs of vllm :
ERROR 10-07 16:26:22 serving_chat.py:155] Error in applying chat template from request: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
/home/miniconda3/envs/lib/python3.10/json/encoder.py:249: RuntimeWarning: coroutine 'AsyncMultiModalItemTracker.all_mm_data' was never awaited
markers, self.default, _encoder, self.indent,
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
INFO: 127.0.0.1:48298 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
I get the same error indeed

INFO: 127.0.0.1:48290 - "POST /v1/chat/completions HTTP/1.1" 200 OK
ERROR 10-07 16:28:56 serving_chat.py:155] Error in applying chat template from request: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
INFO: 127.0.0.1:48290 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
while using this*
So, if you print(all_messages), then you can see exactly what is being sent
(it should already be alternating user/assistant imo)
This feels like a vllm bug somehow, especially if this was working before
it does
Plain Text
[ChatMessage(role=<MessageRole.USER: 'user'>, content='Hello!', additional_kwargs={}), ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=" Hello! How can I help you today? Is there something specific you'd like to know or discuss? I'm here to answer questions and provide information on a wide range of topics. Let me know if you have any questions!", additional_kwargs={'tool_calls': []}), ChatMessage(role=<MessageRole.USER: 'user'>, content='Hello!', additional_kwargs={})] 
Hmm, the additional_kwargs={'tool_calls': []} could be tripping it up
The behaviour of the tool_calls of vLLM changed between V 0.5 and 0.6 you tihnk ?
I can confirm that in V 0.5.X it does work fine
Thats what it feels like yea -- if tool calls is empty, we certainly could stop putting it in additional_kwargs if that is going to be an issue
would be great if you could do that !
Thanks !
Will let you know if it works when it’s merged
Merged and released πŸ™‚
pip install -U llama-index-llms-openai should get it
It's all working smoothly now. Thanks a lot for your help!
Add a reply
Sign up and join the conversation on Discord