Find answers from the community

Updated 12 months ago

Can any help to integrate

Can any help to integrate


llm = Vllm(
model="mistralai/Mistral-7B-Instruct-v0.1",
dtype="float16",
tensor_parallel_size=4,
temperature=0,
max_new_tokens=100,
vllm_kwargs={
"swap_space": 1,
"gpu_memory_utilization": 0.5,
"max_model_len": 4096,
},
)
L
S
8 comments
Whats the issue?
I am not able to download the model in local and do i need provide API keys ? how can i download this model locally query the same
i am trying integrate Metadata Extractor SummaryExtractor, KeywordExtractor. How can i use local mistral for this
Getting this error while accessing VLLm mistral

ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster.
Hi Logan

We are trying to implement rag with Metadata Extractors and trying to implement mistral or deepinfra API.
Is that possible to implement the same. please advise
I am getting an error

The number of required GPUs exceeds the total number of available GPUs in the cluster.
this is trace i am getting llama index vllm implementation

Traceback (most recent call last):
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/ray/_private/utils.py", line 527, in _get_docker_cpus
cpu_ids.append(int(num_or_range))
ValueError: invalid literal for int() with base 10: '\n'
2024-01-11 03:11:51,986 INFO worker.py:1724 -- Started a local Ray instance.
Traceback (most recent call last):
File "/notebooks/notebooks/batch_proces/src/llms.py", line 14, in <module>
llm = Vllm(
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/llama_index/llms/vllm.py", line 153, in init
self._client = VLLModel(
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 105, in init
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 307, in from_engine_args
placement_group = initialize_cluster(parallel_config)
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/vllm/engine/ray_utils.py", line 112, in initialize_cluster
raise ValueError(
ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster.
I always use the vLLM server. For running directly in process, you probably need to set some extra kwargs.

You might have to read some vLLM docs and figure out what this needs to look like to work for you
https://github.com/run-llama/llama_index/blob/e73cc32da749383d5577c8227e25c2b223478e63/llama_index/llms/vllm.py#L153
Add a reply
Sign up and join the conversation on Discord