Thanks @Logan M , I invested a little on this problem and have a minimum example that shows what I would consider a bug. However, as you pointed out doing this with the global service context solved the problem.
So briefly the problem: When I instantiate the index with a service context and a desired llm and then create an actual query engine it only works creating it with
index.as_query_engine()
directly. When customizing, e.g. the retriever, it does not use any passed llm anymore and jumps to the default. In the code sketch below only variant A uses the desired llm, B and C use the default. I have had a few other setups but this shows the core problem I would say.
service_context = ServiceContext.from_defaults(
llm=some_desired_llm,
embed_model=some_embedding_model)
index = VectorStoreIndex.from_vector_store(
some_vector_store,
service_context=service_context)
# Variant A - directly from index
rag_query = index.as_query_engine()
# Variant B - more granular control
rag_query = RetrieverQueryEngine.from_args(
retriever = index.as_retriever(),
response_synthesizer=get_response_synthesizer(streaming=True),
)
# Variant C - more granular control
rag_query = RetrieverQueryEngine.from_args(
retriever = index.as_retriever(),
response_synthesizer=get_response_synthesizer(streaming=True),
service_context=service_context
)
response = rag_query.query("Whats in the doc?")