You can set the embed model and llm in the service context to use azure. Then you can do
VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
Docs on setting up azure:
https://docs.llamaindex.ai/en/stable/examples/customization/llms/AzureOpenAI.htmlThe embedding model is only called once during queries, but depending on the query engine you have setup, the LLM could be called up to 12 times per query yea
we set the response mode to be compact
already
here's the code:
```index = VectorStoreIndex.from_vector_store(vector_store=vector_store)
print("loaded vector store index")
response_synthesizer = get_response_synthesizer(response_mode='compact')
custom_qa_prompt = some_text
qa_prompt_tmpl = PromptTemplate(custom_qa_prompt)
node_postprocessors=[
SentenceEmbeddingOptimizer(percentile_cutoff=0.5)
]
query_engine = index.as_query_engine(
similarity_top_k=rag_nodes_top_k,
response_synthesizer = response_synthesizer,
node_postprocessors = node_postprocessors,
verbose=True
)
query_engine.update_prompts(
{"response_synthesizer:text_qa_template": qa_prompt_tmpl}
)
oh, the SentenceEmbeddingOptimizer
will call the embed model for each node
it removes sentences that are under a similarity score
need embeddings to do that
You can also pass the embed_model to it if you are using azure
SentenceEmbeddingOptimizer(percentile_cutoff=0.5, embed_model=embed_model)
so may I interpret that as: SentenceEmbeddingOptimizer
helps trimming down the size of each node before the "compact" step and stuffing as contxt_str, which increases runtime?
Since the LLM is the majority of runtime, in cases with a smaller top k it will probably be faster. But it also reduces token cost
(but tbh, I never use it lol)
For azure openai object, is api_version
essentially Model version
on their portal?
We have set up the service context, but the app is still trying call the chat_completion endpoint from openai instead of Azure:
```
service_context = ServiceContext.from_defaults(llm = LangChainLLM(rag_llm))
print("got service context")
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
Model version is not api version
thanks, so now you can see i am using LangChainLLM() to wrap the whole lc llm object and there's still 1 call to the openai endpoint
Embeddings will probably be using openai, unless you changed the embed model
yeah but chat/completions is still there
does it have anything to do with lanchain? because this is query engine served as an lc tool
but the langchain agent was set to be using azure endpoints anyway
hey @Logan M I think I have figured it out, service_context has to also be included in the response_synthesizer
. Perhaps I can help update the doc and create a PR?
@Logan M when I try from llama_index.embeddings import AzureOpenAIEmbedding, I get this error: ---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
Cell In[5], line 1
----> 1 from llama_index.embeddings import AzureOpenAIEmbedding
ImportError: cannot import name 'AzureOpenAIEmbedding' from 'llama_index.embeddings'
I think I am using the latest: Successfully installed llama-index-0.9.21
that import definitely exists. Maybe try from a fresh venv? from llama_index.embeddings import AzureOpenAIEmbedding
worked for me π€