Find answers from the community

Updated 4 months ago

You ll need to start with a fresh index

At a glance
You'll need to start with a fresh index if you switch embeddings, the dimensions of every embedding vector need to be the same πŸ‘
P
L
21 comments
hmm I'm generating a GPTVectorStoreIndex from_documents and then persisting it. I re-run that and a new /storage directory is created with the new embeddings (afaik). Is there something else I need to do to ensure that the index is new?
When you initially create the index using from documents, are you using the modified embeddings there?
Whichever embeddings model you use to create the index, you should use for the query as well πŸ€”
mmm I don't think that's the case? The default model for embeddings is ADA-2 but it's GPT-3.5-turbo for the query
and generating embeddings with ada-2 and then querying with gpt-3.5-turbo works fine
I ran into the same issue generating embeddings with Curie
Right. Ada is the default for embeddings (mostly because it's fast, cheap, and works well)

Maybe I should make a quick example of what I mean to make this work haha one sec
this might also help: I've abstracted my service_context and return basically this to do both the embeddings creation and querying:
Plain Text
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit)
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=1, model_name="gpt-3.5-turbo"))
embed_model = OpenAIEmbedding(model=OpenAIEmbeddingModelType.ADA)

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper, embed_model=embed_model)
(that doesn't work though because ADA-1 also appears to have the same issue)
Right, so when you call from_documents(), you'll need to pass in that service context

.from_documents(documents, service_context=service_context)

Then, maybe you persist it

index.storage_context.persist()

When you load it again, you need to pass in the same service context

index = load_index_from_storage(storage_context, service_context=service_context)

As long as the service context is the same for both steps, it should be working πŸ€”
yep that's exactly what I'm doing!
I've definitely managed to change the embeddings fine before (using huggingface embeddings) so something weird is going on lol
Two things that might work? Lol

  1. Try completely removing the persist dir and re-running the index creation/query
  1. If that still doesn't work, try also passing in the service context to as_query_engine()
hmm you know what. I think I was messing up my load_index_from_storage call
I was passing service_context to my RetrieverQueryEngine.from_args but not load_index_from_storage
that might do it
Changing the defaults should definitely be easier πŸ™ƒ Kind have to put the service context everywhere haha
yeah I guess so! Really appreciate all your help today, Logan! Have a good one!
You too! :dotsCATJAM:
Add a reply
Sign up and join the conversation on Discord