Issue with vllm calling in llama index

At a glance

The community member is experiencing an issue with the VllmServer from the llama_index library when trying to translate text. They are getting a 400 Bad Request error, and the response from the vLLM server does not contain the expected "text" key. The community members discuss potential solutions, such as checking the network calls, modifying the sampling parameters to include the "model" parameter, and removing any "None" values from the payload. They also mention that the vLLM docs require the "model" parameter to be passed in the request. The community members collaborate to find a solution and plan to raise a pull request to the llama_index project if necessary.

Useful resources

ppavanmantha

I guess there is Issue with vLLM calling

I have the below simple code

Plain Text

from llama_index.llms.vllm import VllmServer
from llama_index.core.llms import ChatMessage, ChatResponse

llm = VllmServer(
    model="meta-llama/Llama-3.1-8B-Instruct",
    api_url="https://<YOUR_HOST>/v1/chat/completions")

messages = [
    ChatMessage(role="system",
                content="You are an expert language translator, always translate the given text to a destnation language. neighter explanation nor additional details are needed. just respond with the translated text."),
    ChatMessage(role="user", content="Translate ##I Love NLP## to French")
]

response: ChatResponse = llm.chat(messages=messages)
print(response)

When i run the code i end up with the below Error !!

Plain Text

/Users/pavanmantha/Desktop/machine_translation/venv/bin/python /Users/pavanmantha/Desktop/machine_translation/mt_playground.py 
Traceback (most recent call last):
  File "/Users/pavanmantha/Desktop/machine_translation/mt_playground.py", line 14, in <module>
    response: ChatResponse = llm.chat(messages=messages)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/llms/callbacks.py", line 173, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 271, in chat
    completion_response = self.complete(prompt, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/llms/callbacks.py", line 431, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 436, in complete
    output = get_response(response)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/utils.py", line 9, in get_response
    return data["text"]
           ~~~~^^^^^^^^
KeyError: 'text'

But the same works in my HTTPie.

Attachment

21 comments

WWhiteFang_Jr

Hey!
Are you getting network calls when hitting via VllmServer ?

If possible can you check what is being returned here: https://github.com/run-llama/llama_index/blob/f633e7393aaa3f36ef518429672b931b1e3bdae8/llama-index-integrations/llms/llama-index-llms-vllm/llama_index/llms/vllm/utils.py#L8C5-L9C24

in your case

ppavanmantha

exactly this is the file i was checking, the response i get back from vLLM server is 400 bad request

ppavanmantha

the sampling_params that are going as part of post request are as below
{'temperature': 1.0, 'max_tokens': 512, 'n': 1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'use_beam_search': False, 'best_of': None, 'ignore_eos': False, 'stop': None, 'logprobs': None, 'top_k': -1, 'top_p': 1.0, 'prompt': 'system: You are an expert language translator, always translate the given text to a destnation language. neighter explanation nor additional details are needed. just respond with the translated text.\nuser: Translate ##I Love NLP## to French\nassistant: ', 'stream': False}

WWhiteFang_Jr

400 that means it is not getting something that's required. maybe a new change was introduced recently 🤔

but this doesnt explain the http request though

ppavanmantha

actually vllm docs says that it need "model" paramter to be passed in as request

ppavanmantha

so our sampling_params should have the "model": "facebook/somemodel" as one of the param

WWhiteFang_Jr

Can you try passing it once

ppavanmantha

trying that

ppavanmantha

Attachment

ppavanmantha

i did two things

Plain Text

sampling_params["model"] = self.model
  # Remove entries where the value is "None"
  cleaned_sampling_params = {k: v for k, v in sampling_params.items() if v != None}

ppavanmantha

the payload is have None values which are not allowed, i just removed if the value is None

WWhiteFang_Jr

Ah great!

ppavanmantha

let me test e2e and will do modifications and raise a PR if thats okay

WWhiteFang_Jr

Sure, sounds great!💪

ppavanmantha

Thanks man..as usual this is a great help, but please check the core class modification that is needed for sure, else our compitition LangChain will win here. because Llamaindex will fail for vLLM

ppavanmantha

i am forcing my org to use llamaindex

WWhiteFang_Jr

Thankyou for the feedback @pavanmantha , Passing this to the team. Definitely will get this checked!

WWhiteFang_Jr

Same here me too!

ppavanmantha

perfect, i am fan boy of llamaindex and i cannot accept failure 😉

WWhiteFang_Jr

Count me in for this as well 💪

ppavanmantha

you are already there

Add a reply

Find answers from the community

Issue with vllm calling in llama index