If I only ever plan on using pre-

At a glance

The post asks if there is a way to set Settings.embed_model to None like there is with Settings.llm. Community members confirm that Settings.embed_model = None works, but note that an embedding model may be needed even when using pre-computed embeddings. They provide an example of disabling both the embedding model and the language model using MockEmbedding and MockLLM.

In a follow-up comment, a community member asks why using Settings.llm = OpenAI(api_key=openai_api_key, base_url=openai_api_base) is not working when they are using a vLLM OpenAI-Compatible Server, while Settings.llm = HuggingFaceLLM() is working fine. The response suggests using the OpenAILike class from the llama-index-llms-openai-like module, which is a thin wrapper around the OpenAI model that makes it compatible with 3rd party tools providing an OpenAI-compatible API.

Useful resources

eeden

If I only ever plan on using pre-computed embeddings, setting Settings.embed_model = None gives me an error about an OpenAI key? Is there a way to set embed_model to None like there is with Settings.llm?

12 comments

LLogan M

Settings.embed_model = None works

LLogan M

but it needs to be like the first step in your code

LLogan M

(also just a note, but you'll probably need an embedding model at some point, even to query an existing db)

LLogan M

Plain Text

>>> from llama_index.core import Settings, VectorStoreIndex, Document
>>> Settings.embed_model = None
Embeddings have been explicitly disabled. Using MockEmbedding.
>>> Settings.llm = None
LLM is explicitly disabled. Using MockLLM.
>>> index = VectorStoreIndex.from_documents([Document.example()])
>>> dummy_response = index.as_query_engine().query("test")

LLogan M

works for me

eeden

okay yes sorry, seems to be working for me now

eeden

What do MockLLM and MockEmbedding do @kapa.ai

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
MockLLM and MockEmbedding are tools provided by LlamaIndex to predict token usage of LLM and embedding calls respectively. This allows you to estimate your costs during index construction and index querying before any respective LLM calls are made.

MockLLM is used to predict token usage of LLM calls. The max_tokens parameter is used as a "worst case" prediction, where each LLM response will contain exactly that number of tokens. If max_tokens is not specified, then it will simply predict back the prompt.

Plain Text

from llama_index.core.llms import MockLLM
from llama_index.core import Settings

# use a mock llm globally
Settings.llm = MockLLM(max_tokens=256)

MockEmbedding is used to predict the token usage of embedding calls.

Plain Text

from llama_index.core import MockEmbedding
from llama_index.core import Settings

# use a mock embedding globally
Settings.embed_model = MockEmbedding(embed_dim=1536)

You can then use these predictors during both index construction and querying.

For more details, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

bbidda7287

@kapa.ai i have doubt in setting.llm my question is the setting.llm=huggingfaceLLM() is working fine but i'm using vllm OpenAI-Compatible Server to inference model while using setting.llm = OpenAI(api_key=openai_api_key, base_url=openai_api_base) is not working why?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
The issue you're facing might be due to the specific setup required for using the vLLM model with the OpenAI class from LlamaIndex. The OpenAI class is designed to work with OpenAI's API, and it might not be directly compatible with the vLLM server.

If you're using a vLLM server launched with vllm.entrypoints.openai.api_server as an OpenAI Compatible Server or via Docker, you need to use the OpenAILike class from the llama-index-llms-openai-like module. This class is a thin wrapper around the OpenAI model that makes it compatible with 3rd party tools that provide an OpenAI-compatible API.

Here's an example of how to use it:

Plain Text

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(model="my model", api_base="<https://hostname.com/v1",> api_key="fake")

Please replace "my model", "<https://hostname.com/v1",> and "fake" with your model name, your server's URL, and your API key respectively.

For more details, please refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

If I only ever plan on using pre-