Find answers from the community

Updated 3 months ago

If I only ever plan on using pre-

If I only ever plan on using pre-computed embeddings, setting Settings.embed_model = None gives me an error about an OpenAI key? Is there a way to set embed_model to None like there is with Settings.llm?
1
L
e
k
12 comments
Settings.embed_model = None works
but it needs to be like the first step in your code
(also just a note, but you'll probably need an embedding model at some point, even to query an existing db)
Plain Text
>>> from llama_index.core import Settings, VectorStoreIndex, Document
>>> Settings.embed_model = None
Embeddings have been explicitly disabled. Using MockEmbedding.
>>> Settings.llm = None
LLM is explicitly disabled. Using MockLLM.
>>> index = VectorStoreIndex.from_documents([Document.example()])
>>> dummy_response = index.as_query_engine().query("test")
okay yes sorry, seems to be working for me now
What do MockLLM and MockEmbedding do @kapa.ai
Let me check the knowledge base for you. One moment! :books:
----------------------
MockLLM and MockEmbedding are tools provided by LlamaIndex to predict token usage of LLM and embedding calls respectively. This allows you to estimate your costs during index construction and index querying before any respective LLM calls are made.

MockLLM is used to predict token usage of LLM calls. The max_tokens parameter is used as a "worst case" prediction, where each LLM response will contain exactly that number of tokens. If max_tokens is not specified, then it will simply predict back the prompt.

Plain Text
from llama_index.core.llms import MockLLM
from llama_index.core import Settings

# use a mock llm globally
Settings.llm = MockLLM(max_tokens=256)


MockEmbedding is used to predict the token usage of embedding calls.

Plain Text
from llama_index.core import MockEmbedding
from llama_index.core import Settings

# use a mock embedding globally
Settings.embed_model = MockEmbedding(embed_dim=1536)


You can then use these predictors during both index construction and querying.

For more details, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai i have doubt in setting.llm my question is the setting.llm=huggingfaceLLM() is working fine but i'm using vllm OpenAI-Compatible Server to inference model while using setting.llm = OpenAI(api_key=openai_api_key, base_url=openai_api_base) is not working why?
Let me check the knowledge base for you. One moment! :books:
----------------------
The issue you're facing might be due to the specific setup required for using the vLLM model with the OpenAI class from LlamaIndex. The OpenAI class is designed to work with OpenAI's API, and it might not be directly compatible with the vLLM server.

If you're using a vLLM server launched with vllm.entrypoints.openai.api_server as an OpenAI Compatible Server or via Docker, you need to use the OpenAILike class from the llama-index-llms-openai-like module. This class is a thin wrapper around the OpenAI model that makes it compatible with 3rd party tools that provide an OpenAI-compatible API.

Here's an example of how to use it:

Plain Text
from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(model="my model", api_base="<https://hostname.com/v1",> api_key="fake")


Please replace "my model", "<https://hostname.com/v1",> and "fake" with your model name, your server's URL, and your API key respectively.

For more details, please refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord