Find answers from the community

Updated 2 months ago

Issue with vllm calling in llama index

I guess there is Issue with vLLM calling

I have the below simple code
Plain Text
from llama_index.llms.vllm import VllmServer
from llama_index.core.llms import ChatMessage, ChatResponse

llm = VllmServer(
    model="meta-llama/Llama-3.1-8B-Instruct",
    api_url="https://<YOUR_HOST>/v1/chat/completions")

messages = [
    ChatMessage(role="system",
                content="You are an expert language translator, always translate the given text to a destnation language. neighter explanation nor additional details are needed. just respond with the translated text."),
    ChatMessage(role="user", content="Translate ##I Love NLP## to French")
]

response: ChatResponse = llm.chat(messages=messages)
print(response)


When i run the code i end up with the below Error !!
Plain Text
/Users/pavanmantha/Desktop/machine_translation/venv/bin/python /Users/pavanmantha/Desktop/machine_translation/mt_playground.py 
Traceback (most recent call last):
  File "/Users/pavanmantha/Desktop/machine_translation/mt_playground.py", line 14, in <module>
    response: ChatResponse = llm.chat(messages=messages)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/llms/callbacks.py", line 173, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 271, in chat
    completion_response = self.complete(prompt, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/llms/callbacks.py", line 431, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 436, in complete
    output = get_response(response)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/utils.py", line 9, in get_response
    return data["text"]
           ~~~~^^^^^^^^
KeyError: 'text'


But the same works in my HTTPie.
Attachment
image.png
W
p
21 comments
exactly this is the file i was checking, the response i get back from vLLM server is 400 bad request
the sampling_params that are going as part of post request are as below
{'temperature': 1.0, 'max_tokens': 512, 'n': 1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'use_beam_search': False, 'best_of': None, 'ignore_eos': False, 'stop': None, 'logprobs': None, 'top_k': -1, 'top_p': 1.0, 'prompt': 'system: You are an expert language translator, always translate the given text to a destnation language. neighter explanation nor additional details are needed. just respond with the translated text.\nuser: Translate ##I Love NLP## to French\nassistant: ', 'stream': False}
400 that means it is not getting something that's required. maybe a new change was introduced recently πŸ€”

but this doesnt explain the http request though
actually vllm docs says that it need "model" paramter to be passed in as request
so our sampling_params should have the "model": "facebook/somemodel" as one of the param
Can you try passing it once
i did two things
Plain Text
sampling_params["model"] = self.model
  # Remove entries where the value is "None"
  cleaned_sampling_params = {k: v for k, v in sampling_params.items() if v != None}
the payload is have None values which are not allowed, i just removed if the value is None
let me test e2e and will do modifications and raise a PR if thats okay
Sure, sounds great!πŸ’ͺ
Thanks man..as usual this is a great help, but please check the core class modification that is needed for sure, else our compitition LangChain will win here. because Llamaindex will fail for vLLM
i am forcing my org to use llamaindex
Thankyou for the feedback @pavanmantha , Passing this to the team. Definitely will get this checked!
Same here me too!
perfect, i am fan boy of llamaindex and i cannot accept failure πŸ˜‰
Count me in for this as well πŸ’ͺ
you are already there
Add a reply
Sign up and join the conversation on Discord