Find answers from the community

Updated 4 months ago

Issue with vllm calling in llama index

At a glance

The community member is experiencing an issue with the VllmServer from the llama_index library when trying to translate text. They are getting a 400 Bad Request error, and the response from the vLLM server does not contain the expected "text" key. The community members discuss potential solutions, such as checking the network calls, modifying the sampling parameters to include the "model" parameter, and removing any "None" values from the payload. They also mention that the vLLM docs require the "model" parameter to be passed in the request. The community members collaborate to find a solution and plan to raise a pull request to the llama_index project if necessary.

Useful resources
I guess there is Issue with vLLM calling

I have the below simple code
Plain Text
from llama_index.llms.vllm import VllmServer
from llama_index.core.llms import ChatMessage, ChatResponse

llm = VllmServer(
    model="meta-llama/Llama-3.1-8B-Instruct",
    api_url="https://<YOUR_HOST>/v1/chat/completions")

messages = [
    ChatMessage(role="system",
                content="You are an expert language translator, always translate the given text to a destnation language. neighter explanation nor additional details are needed. just respond with the translated text."),
    ChatMessage(role="user", content="Translate ##I Love NLP## to French")
]

response: ChatResponse = llm.chat(messages=messages)
print(response)


When i run the code i end up with the below Error !!
Plain Text
/Users/pavanmantha/Desktop/machine_translation/venv/bin/python /Users/pavanmantha/Desktop/machine_translation/mt_playground.py 
Traceback (most recent call last):
  File "/Users/pavanmantha/Desktop/machine_translation/mt_playground.py", line 14, in <module>
    response: ChatResponse = llm.chat(messages=messages)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/llms/callbacks.py", line 173, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 271, in chat
    completion_response = self.complete(prompt, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/llms/callbacks.py", line 431, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 436, in complete
    output = get_response(response)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/utils.py", line 9, in get_response
    return data["text"]
           ~~~~^^^^^^^^
KeyError: 'text'


But the same works in my HTTPie.
Attachment
image.png
W
p
21 comments
exactly this is the file i was checking, the response i get back from vLLM server is 400 bad request
the sampling_params that are going as part of post request are as below
{'temperature': 1.0, 'max_tokens': 512, 'n': 1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'use_beam_search': False, 'best_of': None, 'ignore_eos': False, 'stop': None, 'logprobs': None, 'top_k': -1, 'top_p': 1.0, 'prompt': 'system: You are an expert language translator, always translate the given text to a destnation language. neighter explanation nor additional details are needed. just respond with the translated text.\nuser: Translate ##I Love NLP## to French\nassistant: ', 'stream': False}
400 that means it is not getting something that's required. maybe a new change was introduced recently πŸ€”

but this doesnt explain the http request though
actually vllm docs says that it need "model" paramter to be passed in as request
so our sampling_params should have the "model": "facebook/somemodel" as one of the param
Can you try passing it once
i did two things
Plain Text
sampling_params["model"] = self.model
  # Remove entries where the value is "None"
  cleaned_sampling_params = {k: v for k, v in sampling_params.items() if v != None}
the payload is have None values which are not allowed, i just removed if the value is None
let me test e2e and will do modifications and raise a PR if thats okay
Sure, sounds great!πŸ’ͺ
Thanks man..as usual this is a great help, but please check the core class modification that is needed for sure, else our compitition LangChain will win here. because Llamaindex will fail for vLLM
i am forcing my org to use llamaindex
Thankyou for the feedback @pavanmantha , Passing this to the team. Definitely will get this checked!
Same here me too!
perfect, i am fan boy of llamaindex and i cannot accept failure πŸ˜‰
Count me in for this as well πŸ’ͺ
you are already there
Add a reply
Sign up and join the conversation on Discord