Find answers from the community

Home
Members
pavanmantha
p
pavanmantha
Offline, last seen 2 months ago
Joined September 25, 2024
I guess there is Issue with vLLM calling

I have the below simple code
Plain Text
from llama_index.llms.vllm import VllmServer
from llama_index.core.llms import ChatMessage, ChatResponse

llm = VllmServer(
    model="meta-llama/Llama-3.1-8B-Instruct",
    api_url="https://<YOUR_HOST>/v1/chat/completions")

messages = [
    ChatMessage(role="system",
                content="You are an expert language translator, always translate the given text to a destnation language. neighter explanation nor additional details are needed. just respond with the translated text."),
    ChatMessage(role="user", content="Translate ##I Love NLP## to French")
]

response: ChatResponse = llm.chat(messages=messages)
print(response)


When i run the code i end up with the below Error !!
Plain Text
/Users/pavanmantha/Desktop/machine_translation/venv/bin/python /Users/pavanmantha/Desktop/machine_translation/mt_playground.py 
Traceback (most recent call last):
  File "/Users/pavanmantha/Desktop/machine_translation/mt_playground.py", line 14, in <module>
    response: ChatResponse = llm.chat(messages=messages)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/llms/callbacks.py", line 173, in wrapped_llm_chat
    f_return_val = f(_self, messages, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 271, in chat
    completion_response = self.complete(prompt, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 307, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/core/llms/callbacks.py", line 431, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/base.py", line 436, in complete
    output = get_response(response)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/pavanmantha/Desktop/machine_translation/venv/lib/python3.12/site-packages/llama_index/llms/vllm/utils.py", line 9, in get_response
    return data["text"]
           ~~~~^^^^^^^^
KeyError: 'text'


But the same works in my HTTPie.
21 comments
W
p
I noticed couple of things.
  1. vLLM needs the model parameter in the payload but i don't see that is being sent from llama-index-llms-vllm
  2. why the payload to the /chat/completions is being translated to prompt infact it should be messages: [{role:'',content:''},{role:'',content:''}]
5 comments
W
p
The integration with mlflow (the recent release) is not fully tested, i am getting issues while i am testing a RAG with external vectors stores (qdrant)
54 comments
p
L
W
p
pavanmantha
·

Nodes

i have a very basic question, when we generate nodes as below. Will the len(nodes) differ with respect to hardware? the reason for this question is when i see the len(nodes) on my Mac M1 Pro i get it as approx 900 but when i use kaggle or colab with p100 GPU i get only 60

Note: Assume my chunk size is constant at Settings.chunk_size=512

documents = SimpleDirectoryReader("./data/").load_data()
node_parser = SentenceSplitter()
nodes = node_parser.get_nodes_from_documents(documents)
3 comments
p
L
Information!!
Mlflow-llamaindex integration broke in the latest release of llamaindex, I reported it to mlflow and it’s fixed now it’s getting released this weekend
2 comments
p
W
One more thing:
ERROR: Cannot install -r requirements.txt (line 4) and llama-index-llms-openai==0.1.11 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested llama-index-llms-openai==0.1.11
llama-index 0.10.68 depends on llama-index-llms-openai<0.2.0 and >=0.1.27

the error recomends the llama-index-llms-openai version as >=0.1.27 but the repo does not even have this version.
10 comments
p
L
Hey, i think there are breaking changes introduced in the latest builds of llamaindex. any release notes on these?
example:
ERROR: Cannot install -r requirements.txt (line 4) and llama-index-embeddings-openai==0.2.0 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested llama-index-embeddings-openai==0.2.0
llama-index 0.10.68 depends on llama-index-embeddings-openai<0.2.0 and >=0.1.5

my requirements.txt is as below
qdrant-client==1.11.0
python-dotenv==1.0.1
arize-phoenix==4.26.0
llama-index==0.10.68
llama-index-llms-openai==0.2.0
llama-index-llms-ollama==0.3.0
llama-index-embeddings-openai==0.2.0
llama-index-embeddings-ollama==0.3.0
llama-index-vector-stores-qdrant==0.3.0
llama-index-callbacks-arize-phoenix==0.2.0
12 comments
L
p
I have raised an official PR in vLLM documentation for vLLM serving using llamaindex
3 comments
p
W
The new feature of mlflow with llamaindex when tried with ollama is failing with error
TypeError: Ollama.init() got an unexpected keyword argument 'system_prompt'
12 comments
p
L