LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

Custom embeddings

Custom embeddings

At a glance

The community members are discussing how to use a custom embedding model instead of OpenAI for indexing and querying documents. They provide code examples for creating a custom embedding model and integrating it with the llama-index library. The main points are:

- Community members suggest using a custom embedding model by adding it to the service context, and then passing the service context when reading the documents.

- They also discuss how to ingest and persist custom embeddings without using OpenAI, and how to use the custom embeddings for querying while still using OpenAI for the final prediction.

- However, the community members encounter rate limiting issues when using OpenAI for the final prediction, even with a paid plan. They try various approaches, such as using GPT-3.5 and passing the API key directly, but the rate limiting issue persists.

There is no explicitly marked answer, but the community members collaborate to troubleshoot the issue and find a solution.

Useful resources

·

Hi. I want to use a custom model to create embeddings instead of openai. I looked up for existing similar help requests. I found a link but that link is not working https://discord.com/channels/1059199217496772688/1059200010622873741/1073048915747807302

W

V

L

40 comments

You can create custom embeddings by adding custom embedding model to the service context. Then pass this service context when you are reading the docs. If you do not pass then it will take default embedding model. i.e OpenAI

Plain Text

from llama_index import LangchainEmbedding, ServiceContext
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

embed_model = LangchainEmbedding(HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))

service_context = ServiceContext.from_defaults(chunk_size_limit=512, embed_model=embed_model)

You can set this service context as global, then you won't have to worry about passing it anywhere.

thanks for the help. But suppose I have the embeddings ready and I want to ingest these embeddings without using openai. Is it possible to do that ?

Ingesting does not use OpenAI.
It will be only used when you are querying your docs.

Have a look at how you can persist your embeddings.

One thing that you need to make sure is that OpenAI embeddings would not work if you are going to use custom embed model and vice versa.

You'll have to create custom embeddings once. Persist them and then use them.

https://gpt-index.readthedocs.io/en/latest/getting_started/starter_example.html#saving-and-loading

thanks for the response. I have a particular use case in mind. Suppose I Index my custom embeddings and ask a query (using custom embeddings) to get top_n results . I want to use these top_n results as input for openai to get the answer for the query. Is it possible to do this ?

Yes

Since you are only passing the embed model. The llm part will be default for querying which is openai

that's great thanks for confirmation and being patient🙂

Hi, there! I was able to use custom embedding model for indexing and retrieving. But for final prediction I'm getting rate limit error from openAI. can you help me with this? Context size is just 1 paragraph. I also mentioned the openAI API Key.
for reference :
DEBUG:openai:api_version=None data='{"prompt": "Context information is below.\n---------------------\n\u201c Borrowing Request \u201d means a request by the Borrower for a Borrowing in accordance with Section 2.03 substantially in the form of Exhibit A.\n\n\u201c Borrower \u201d has the meaning specified in the preamble hereto.\n\n\u201c Borrowing \u201d means Loans of the same Class and Type, made, converted or continued on the same date and, in the case of Eurodollar Loans, as to which a single Interest Period is in effect.\n---------------------\nGiven the context information and not prior knowledge, answer the question: What is the borrower name ?\n", "stream": false, "model": "text-davinci-003", "temperature": 0.0, "max_tokens": 3953}' message='Post details'

I think rate limiting is handled in llamaIndex🧐 . Did you started getting this in the first query or started getting in between?

yes it is the first query.

Is this the right way to give openAI API key ? Because I'm getting rate limit error even though I have a paid plan.

os.environ["OPENAI_API_KEY"] = "my_API_KEY"
embModelClass =INSTRUCTOR("hkunlp/instructor-xl")
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))
service_context = ServiceContext.from_defaults(chunk_size_limit=512, embed_model=embed_model,llm_predictor=llm_predictor)

index=VectorStoreIndex.from_documents(
data,
service_context=service_context
# response_synthesizer=response_synthesizer,
)

What is llamaIndex version that you are trying with? Becuase in recent version of llamaIndex llm_predictor name has been changed to llm

I would suggest you try it like this once

Plain Text

service_context = ServiceContext.from_defaults(chunk_size_limit=512, embed_model=embed_model)

Also your embed_model is defined with embModelClass variable

0.7.10

try this once!

service_context = ServiceContext.from_defaults(chunk_size_limit=512, embed_model=embed_model)
I tried this but I'm still getting the rate limit error.

Yes, that is a custom embedding model which mocks the expected behaviour. It is working fine. Code is throwing error when it is trying to send a request to openAI.

Can you try with GPT-3.5? Rate limiting errors are increasing way too much

Plain Text

from langchain.chat_models import ChatOpenAI
llm_predictor = LLMPredictor(llm=ChatOpenAI(openai_api_key=OPENAI_API_KEY,temperature=0, max_tokens=1024, model_name="gpt-3.5-turbo"))

pass this into your service context.

ok let me check .

unfortunately, this is also throwing the same error.

Can you try making a sample request to OpenAI in a python script. If that is also not working maybe then its something else.

Plain Text

import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

response = openai.Completion.create(
  model="text-davinci-003",
  prompt="",
  temperature=1,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

this program is running without any error

lol. I guess we need to call @Logan M here 😅

Are you running in colab? I've seen that openai will severely rate limit calls from colab servers

No, I'm running a python file.

I have no idea then, I'm lost 😅

Never seen this before 🤔

Yeah, just to be sure I'm lost I asked @VallalaDev to run the sole openai query to see if there is something wrong at openai side altogether but it worked.
😅

Maybe you could downgrade the openai version 🤔

@WhiteFang_Jr
I changed this line
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="gpt-3.5-turbo",openai_api_key=<key>))
and I'm getting predictions.

thanks for the help guys.

So by adding openai_api_key it worked. Maybe it is not able to pick env API key in new version.
?

Also @VallalaDev If you want to use GPT3.5 you should use ChatOpenAI

we tried ChatOpenAI with gpt3.5 Turbo, but it didn't work .

here.

Try passing the key variable in there too

That's the only change you did right?

Oh i see it already has the key variable. Anyway it worked for you 😅

I'll check with your version in evening.

I also changed the model_name from text-davinci-003 to gpt3.5 Turbo

Add a reply

Sign up and join the conversation on Discord