Find answers from the community

Updated 3 months ago

Hi

Hi,
when indexing text, how can I pass the OpenAI key to it? It can't be read from the environment variable, only passed. It varies from user to user, we store it as encrypted string.
My code:
Plain Text
embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000)
service_context = ServiceContext.from_defaults(chunk_size=chunk_size, embed_model=embed_model,
                                                        callback_manager=token_counter_callback_manager)
node_parser = SimpleNodeParser.from_defaults(chunk_size=chunk_size, chunk_overlap=20)
index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)

Thank you!
L
S
28 comments
You can't set os.environ['OPENAI_API_KEY'] in code?

Otherwise, I think you can do

Plain Text
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-4", api_key="...")
embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000, api_key="...")
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, ...)
@Logan M Okay, but why pass it twice in llm and embed_model?
they are different models with different API connections?
Othwise you can do os.environ['OPENAI_API_KEY'] = "sk-..."
Now, I'm talking only about embedding, not about querying. I can't use environment because the key varies
You need to give the key to both. If you don't need the LLM , set llm=None
Yes, I know but these 2 processes are separated. For indexing, I don't need to specify LLM explicitly, right?
it's not technically needed (for a vector index), but you'll need it later if you want to query. Generally I just put everything in at the start so that as_query_engine() works

If you don't set llm=None, you'll get an error about a missing key because without that it will default to openai in the service context
Wait, I'm confused now. Do you mean this code will issue an error?

Plain Text
embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000, api_key="...")
service_context = ServiceContext.from_defaults(embed_model=embed_model)
Yes, because you openai key isn't in the env, the LLM will default openai, error out, fallback to llama_cpp, and probably error out as well if you don't have that installed
Try it, see what happens πŸ™‚
Ah, okay... Thank you! One more question though - if it's about embedding only, I still have to pass, say "gpt" not "text-data"?
hmm not sure what you mean by "gpt" not "text-data"
@Logan M sorry a typo should be "text-ada"
I mean when we create embeddings, we use the embeddings model like "text-ada" not anything else so passing the LLM with "gpt" model looks a bit weird to me when creating embeddings.
the LLM and embeddings are two different models, and are used for two different purposes

OpenAIEmbedding uses text-embedding-ada-002 and is only used for embeddings

OpenAI can be set to gpt-3.5-turbo, gpt-4, etc, and is only used to generate text
Yes, exactly, this is why it's confusing. My task now is only to create embeddings, so why would I pass "gpt" model as llm - I don't understand
because the service context holds a lot of things, including both the LLM and embed model

If you dont pass in an llm, it defaults to openai

Just how it is
So, even if I don't use the gpt model right now, I should pass it?
it will make life easier πŸ™‚
Okay. Let me try it. Thank you. Well about "easier" not sure because as I said, I have 2 processes - creating embeddings and querying - separated and I create service context instance every time, so this approach is a bit confusing
it will make sense, just give it a try πŸ™‚
One more question. It doesn't matter here, which exactly gpt model I pass to LLM, just for creating embeddings, right?
Yea pretty much
But then if do index.as_query_engine() it will use the llm from the service context
Well, it will be a new instance anyway, so I'll pass a right model then
Okay, I tried not to pass llm, and it worked fine
Add a reply
Sign up and join the conversation on Discord