LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Openailike class errors with latest vllm updates

Openailike class errors with latest vllm updates

At a glance

The community member is experiencing issues with the OpenAILike class following the latest updates of vLLM (0.6.X). They are getting a BadRequestError when calling their chat_engine, regardless of the model. The issue does not occur with versions 0.5.X. Community members suggest inspecting the kwargs/options being sent to vLLM, updating or downgrading the openai and llama-index-llms-openai packages, and looking at the changes in vLLM. After some investigation, it appears the issue is related to the 'tool_calls' field in the additional_kwargs, which was introduced in vLLM 0.6. A pull request was made to the llama_index library to address this issue, and the community member confirmed that the fix resolved the problem after the update.

Useful resources

·

Hello,
I'm having issues with the OpenAILike class following the latest updates of vLLM (0.6.X).
I'm getting theses types of errors when I call my chat_engine, no matter the model.

Plain Text

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'This model only supports single tool-calls at once!', 'type': 'BadRequestError', 'param': None, 'code': 400}

I don't have the issues with the versions 0.5.X any idea why ?

L

T

33 comments

I have no idea -- I haven't been keeping up with vllm updates 👀

I'm not even sure what would cause this error... might have to inspect what kwargs/options OpenAILike is sending to vllm?

What chat engine are you using? Whats your llm setup?

Plain Text

from llama_index.core.chat_engine import SimpleChatEngine
from openai import OpenAI
from llama_index.llms.openai_like import OpenAILike


openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

openai_client = OpenAI(
            api_key=openai_api_key,
            base_url=openai_api_base,
        )
models = openai_client.models.list()
selected_model = models.data[0].id

llm = OpenAILike(
                api_key=openai_api_key,
                api_base=openai_api_base,
                model=selected_model,
                max_tokens=2500,
                is_chat_model=True,
            )

chat_engine = SimpleChatEngine.from_defaults(llm=llm)

chat_engine.chat("Hello")

vllm 0.6.2
llama-index-core 0.11.16
llama-index-llms-openai-like 0.2.0

for the versions

the first request always pass, it's the second one that get the error message for some reason

From simple chat engine 😅 oh boy

you can test the llm directly in this case to remove some ambiguity

Plain Text

from llama_index.core.llms import ChatMessage

llm.chat([ChatMessage(role="user", content="Hello!")])

You can see exactly what options openai-like is sending along to your LLM, by checking print(llm._get_model_kwargs())

You may or may not also need to update (or downgrade! I have no idea with vllm) openai and llama-index-llms-openai packages

Probably best to look at the changes vllm has been making lately too... maybe theres some option that needs to be set for the model you are using

Okay thanks, I'll look further into that and I'll let you know if I find anything

with vLLM 0.6.2 this works, but not while using the simple chat engine

Plain Text

from llama_index.core.llms import ChatMessage

llm.chat([ChatMessage(role="user", content="Hello!"),
          ChatMessage(role="assistant", content="Hello!"),
          ChatMessage(role="user", content="Hello!")])

🤔

I don't think simple chat engine is doing anything different... lemme check

Plain Text

all_messages = self._prefix_messages + self._memory.get()

chat_response = self._llm.chat(all_messages)

That should be the exact same thing 😅

prefix messages will be empty for you, since you aren't passing it in (or a system prompt)

So to replicate entirely would be

Plain Text

from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.llms import ChatMessage

llm = ...
memory = ChatMemoryBuffer.from_defaults(llm=llm)
memory.put(ChatMessage(role="user", content="Hello!"))

all_messages = memory.get()
resp = llm.chat(all_messages)
memory.put(resp.message)

memory.put(ChatMessage(role="user", content="Hello!"))
all_messages = memory.get()
resp = llm.chat(all_messages)
memory.put(resp.message)
...

I'm getting this error with mistral

Plain Text

BadRequestError: Error code: 400 - {'object': 'error', 'message': 'After the optional system message, conversation roles must alternate user/assistant/user/assistant/...', 'type': 'BadRequestError', 'param': None, 'code': 400}

and in the logs of vllm :
ERROR 10-07 16:26:22 serving_chat.py:155] Error in applying chat template from request: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
/home/miniconda3/envs/lib/python3.10/json/encoder.py:249: RuntimeWarning: coroutine 'AsyncMultiModalItemTracker.all_mm_data' was never awaited
markers, self.default, _encoder, self.indent,
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
INFO: 127.0.0.1:48298 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

I get the same error indeed

INFO: 127.0.0.1:48290 - "POST /v1/chat/completions HTTP/1.1" 200 OK
ERROR 10-07 16:28:56 serving_chat.py:155] Error in applying chat template from request: After the optional system message, conversation roles must alternate user/assistant/user/assistant/...
INFO: 127.0.0.1:48290 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

while using this*

So, if you print(all_messages), then you can see exactly what is being sent

(it should already be alternating user/assistant imo)

This feels like a vllm bug somehow, especially if this was working before

it does

Plain Text

[ChatMessage(role=<MessageRole.USER: 'user'>, content='Hello!', additional_kwargs={}), ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content=" Hello! How can I help you today? Is there something specific you'd like to know or discuss? I'm here to answer questions and provide information on a wide range of topics. Let me know if you have any questions!", additional_kwargs={'tool_calls': []}), ChatMessage(role=<MessageRole.USER: 'user'>, content='Hello!', additional_kwargs={})]

Hmm, the additional_kwargs={'tool_calls': []} could be tripping it up

The behaviour of the tool_calls of vLLM changed between V 0.5 and 0.6 you tihnk ?

I can confirm that in V 0.5.X it does work fine

Thats what it feels like yea -- if tool calls is empty, we certainly could stop putting it in additional_kwargs if that is going to be an issue

would be great if you could do that !

I thiiiiiink this will fix it
https://github.com/run-llama/llama_index/pull/16408

Thanks !
Will let you know if it works when it’s merged

Merged and released 🙂

pip install -U llama-index-llms-openai should get it

It's all working smoothly now. Thanks a lot for your help!

Add a reply

Sign up and join the conversation on Discord