Hi

At a glance

Hi,
when indexing text, how can I pass the OpenAI key to it? It can't be read from the environment variable, only passed. It varies from user to user, we store it as encrypted string.
My code:

Plain Text

embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000)
service_context = ServiceContext.from_defaults(chunk_size=chunk_size, embed_model=embed_model,
                                                        callback_manager=token_counter_callback_manager)
node_parser = SimpleNodeParser.from_defaults(chunk_size=chunk_size, chunk_overlap=20)
index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)

Thank you!

28 comments

LLogan M

You can't set os.environ['OPENAI_API_KEY'] in code?

Otherwise, I think you can do

Plain Text

from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-4", api_key="...")
embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000, api_key="...")
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, ...)

SSeaCat

@Logan M Okay, but why pass it twice in llm and embed_model?

LLogan M

they are different models with different API connections?

LLogan M

Othwise you can do os.environ['OPENAI_API_KEY'] = "sk-..."

SSeaCat

Now, I'm talking only about embedding, not about querying. I can't use environment because the key varies

LLogan M

You need to give the key to both. If you don't need the LLM , set llm=None

SSeaCat

Yes, I know but these 2 processes are separated. For indexing, I don't need to specify LLM explicitly, right?

LLogan M

it's not technically needed (for a vector index), but you'll need it later if you want to query. Generally I just put everything in at the start so that as_query_engine() works

If you don't set llm=None, you'll get an error about a missing key because without that it will default to openai in the service context

SSeaCat

Wait, I'm confused now. Do you mean this code will issue an error?

Plain Text

embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000, api_key="...")
service_context = ServiceContext.from_defaults(embed_model=embed_model)

LLogan M

Yes, because you openai key isn't in the env, the LLM will default openai, error out, fallback to llama_cpp, and probably error out as well if you don't have that installed

LLogan M

Try it, see what happens 🙂

SSeaCat

Ah, okay... Thank you! One more question though - if it's about embedding only, I still have to pass, say "gpt" not "text-data"?

LLogan M

hmm not sure what you mean by "gpt" not "text-data"

SSeaCat

@Logan M sorry a typo should be "text-ada"

SSeaCat

I mean when we create embeddings, we use the embeddings model like "text-ada" not anything else so passing the LLM with "gpt" model looks a bit weird to me when creating embeddings.

LLogan M

the LLM and embeddings are two different models, and are used for two different purposes

OpenAIEmbedding uses text-embedding-ada-002 and is only used for embeddings

OpenAI can be set to gpt-3.5-turbo, gpt-4, etc, and is only used to generate text

SSeaCat

Yes, exactly, this is why it's confusing. My task now is only to create embeddings, so why would I pass "gpt" model as llm - I don't understand

LLogan M

because the service context holds a lot of things, including both the LLM and embed model

If you dont pass in an llm, it defaults to openai

Just how it is

SSeaCat

So, even if I don't use the gpt model right now, I should pass it?

LLogan M

yes

LLogan M

it will make life easier 🙂

SSeaCat

Okay. Let me try it. Thank you. Well about "easier" not sure because as I said, I have 2 processes - creating embeddings and querying - separated and I create service context instance every time, so this approach is a bit confusing

LLogan M

it will make sense, just give it a try 🙂

SSeaCat

One more question. It doesn't matter here, which exactly gpt model I pass to LLM, just for creating embeddings, right?

LLogan M

Yea pretty much

LLogan M

But then if do index.as_query_engine() it will use the llm from the service context

SSeaCat

Well, it will be a new instance anyway, so I'll pass a right model then

SSeaCat

Okay, I tried not to pass llm, and it worked fine

Add a reply

Find answers from the community

Hi