Can any help to integrate

At a glance

Can any help to integrate

llm = Vllm(
model="mistralai/Mistral-7B-Instruct-v0.1",
dtype="float16",
tensor_parallel_size=4,
temperature=0,
max_new_tokens=100,
vllm_kwargs={
"swap_space": 1,
"gpu_memory_utilization": 0.5,
"max_model_len": 4096,
},
)

8 comments

LLogan M

Whats the issue?

SSridhar

I am not able to download the model in local and do i need provide API keys ? how can i download this model locally query the same

SSridhar

i am trying integrate Metadata Extractor SummaryExtractor, KeywordExtractor. How can i use local mistral for this

SSridhar

Getting this error while accessing VLLm mistral

ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster.

SSridhar

Hi Logan

We are trying to implement rag with Metadata Extractors and trying to implement mistral or deepinfra API.
Is that possible to implement the same. please advise

SSridhar

I am getting an error

The number of required GPUs exceeds the total number of available GPUs in the cluster.

SSridhar

this is trace i am getting llama index vllm implementation

Traceback (most recent call last):
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/ray/_private/utils.py", line 527, in _get_docker_cpus
cpu_ids.append(int(num_or_range))
ValueError: invalid literal for int() with base 10: '\n'
2024-01-11 03:11:51,986 INFO worker.py:1724 -- Started a local Ray instance.
Traceback (most recent call last):
File "/notebooks/notebooks/batch_proces/src/llms.py", line 14, in <module>
llm = Vllm(
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/llama_index/llms/vllm.py", line 153, in init
self._client = VLLModel(
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/vllm/entrypoints/llm.py", line 105, in init
self.llm_engine = LLMEngine.from_engine_args(engine_args)
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 307, in from_engine_args
placement_group = initialize_cluster(parallel_config)
File "/notebooks/notebooks/batch_proces/src/venv/lib/python3.9/site-packages/vllm/engine/ray_utils.py", line 112, in initialize_cluster
raise ValueError(
ValueError: The number of required GPUs exceeds the total number of available GPUs in the cluster.

LLogan M

I always use the vLLM server. For running directly in process, you probably need to set some extra kwargs.

You might have to read some vLLM docs and figure out what this needs to look like to work for you
https://github.com/run-llama/llama_index/blob/e73cc32da749383d5577c8227e25c2b223478e63/llama_index/llms/vllm.py#L153

Add a reply

Find answers from the community

Can any help to integrate