for vector stores on supabase created

At a glance

for vector stores on supabase created through VectorStoreIndex, how could we specify the llm and embedding models to point to AzureOpenAI instead of OpenAI where the index was fetched via VectorStoreIndex.from_vector_store(vector_store=vector_store) on a separate server and used as a query engine tool?

Besides, the vecs table right now has around 500 nodes, each node with 500 characters, top_k is set as 12, what we observed is that there were 12 calls to the OpenAI Embedding model that spanned over 6 seconds. Is that normal? And why was the embedding model called 12 times for the same query text when it comes to similarity calculation?

23 comments

LLogan M

You can set the embed model and llm in the service context to use azure. Then you can do

VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)

Docs on setting up azure:
https://docs.llamaindex.ai/en/stable/examples/customization/llms/AzureOpenAI.html

The embedding model is only called once during queries, but depending on the query engine you have setup, the LLM could be called up to 12 times per query yea

ddiscorthur

we set the response mode to be compact already
here's the code:

```index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

print("loaded vector store index")
response_synthesizer = get_response_synthesizer(response_mode='compact')

custom_qa_prompt = some_text

qa_prompt_tmpl = PromptTemplate(custom_qa_prompt)

node_postprocessors=[
SentenceEmbeddingOptimizer(percentile_cutoff=0.5)
]

query_engine = index.as_query_engine(
similarity_top_k=rag_nodes_top_k,
response_synthesizer = response_synthesizer,
node_postprocessors = node_postprocessors,
verbose=True
)

query_engine.update_prompts(
{"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)

LLogan M

oh, the SentenceEmbeddingOptimizer will call the embed model for each node

LLogan M

it removes sentences that are under a similarity score

LLogan M

need embeddings to do that

LLogan M

You can also pass the embed_model to it if you are using azure

SentenceEmbeddingOptimizer(percentile_cutoff=0.5, embed_model=embed_model)

ddiscorthur

so may I interpret that as: SentenceEmbeddingOptimizer helps trimming down the size of each node before the "compact" step and stuffing as contxt_str, which increases runtime?

LLogan M

Since the LLM is the majority of runtime, in cases with a smaller top k it will probably be faster. But it also reduces token cost

(but tbh, I never use it lol)

ddiscorthur

For azure openai object, is api_version essentially Model version on their portal?

Attachment

ddiscorthur

We have set up the service context, but the app is still trying call the chat_completion endpoint from openai instead of Azure:

```
service_context = ServiceContext.from_defaults(llm = LangChainLLM(rag_llm))
print("got service context")
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)

ddiscorthur

```2023-12-22 12:27:36,256:INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
2023-12-22 12:27:46,651:INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

LLogan M

Model version is not api version

LLogan M

https://discord.com/channels/1059199217496772688/1187560945040511186/1187608837038026772

ddiscorthur

thanks, so now you can see i am using LangChainLLM() to wrap the whole lc llm object and there's still 1 call to the openai endpoint

LLogan M

Embeddings will probably be using openai, unless you changed the embed model

ddiscorthur

yeah but chat/completions is still there

ddiscorthur

does it have anything to do with lanchain? because this is query engine served as an lc tool

ddiscorthur

but the langchain agent was set to be using azure endpoints anyway

ddiscorthur

hey @Logan M I think I have figured it out, service_context has to also be included in the response_synthesizer. Perhaps I can help update the doc and create a PR?

GGreg Tanaka

@Logan M when I try from llama_index.embeddings import AzureOpenAIEmbedding, I get this error: ---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[5], line 1
----> 1 from llama_index.embeddings import AzureOpenAIEmbedding
ImportError: cannot import name 'AzureOpenAIEmbedding' from 'llama_index.embeddings'

GGreg Tanaka

I think I am using the latest: Successfully installed llama-index-0.9.21

LLogan M

that import definitely exists. Maybe try from a fresh venv? from llama_index.embeddings import AzureOpenAIEmbedding worked for me 🤔

GGreg Tanaka

thanks, that worked

Add a reply

Find answers from the community

for vector stores on supabase created