make sure you're passing the correct service context with the correct model.
You can also try setting the service context globally
from llama_index import set_global_service_context
set_global_service_context(service_context)
strmlt.session_state.documents = SimpleDirectoryReader(folder).load_data()
strmlt.session_state.llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo-1106", max_tokens=2048)
strmlt.session_state.embeddings = OpenAIEmbedding(model="text-embedding-ada-002")
strmlt.session_state.prompt_helper = PromptHelper(context_window=16384, num_output=2048)
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
strmlt.session_state.service_context = ServiceContext.from_defaults(
llm=strmlt.session_state.llm,
embed_model=strmlt.session_state.embeddings,
prompt_helper=strmlt.session_state.prompt_helper,
callback_manager=callback_manager
)
set_global_service_context(service_context)
index1 = SummaryIndex(
strmlt.session_state.documents,
#service_context=strmlt.session_state.service_context,
summary_text=strmlt.session_state.summary_text,
response_mode="tree_summarize"
)
index2 = VectorStoreIndex(
strmlt.session_state.documents,
#service_context=strmlt.session_state.service_context,
)
list_query_engine = index1.as_query_engine(response_mode="tree_summarize")
vector_query_engine = index2.as_query_engine(similarity_top_k=4)
list_tool = QueryEngineTool.from_defaults(
query_engine=list_query_engine,
description="Utile pour les questions de synthèse liées à la source de données",
)
vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine,
description="Utile pour retrouver un contexte spécifique lié à la source de données",
)
strmlt.session_state.query_engine = RouterQueryEngine.from_defaults(
query_engine_tools=[
list_tool,
vector_tool,
],
#service_context=strmlt.session_state.service_context,
)
strmlt.session_state.chat_history = []
strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
query_engine=strmlt.session_state.query_engine,
chat_history=strmlt.session_state.chat_history,
#service_context=strmlt.session_state.service_context
)
still the same error after setting the service context globally?
message': "This model's maximum context length is 8192 tokens, however you requested 10562 tokens (10562 in your prompt; 0 for the completion).
here is what I have, again
weird, only 8k model in that service context is the embedding model, but the text is chunked before passing to the embedding model
is it chunked correctly ?
in your code above is the service_context
variable defined before?
else I would have an error
yeah you don't have the latest one I undertsand why u say this
here is:
strmlt.session_state.documents = SimpleDirectoryReader(folder).load_data()
strmlt.session_state.llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo-16k", max_tokens=2048)
strmlt.session_state.embeddings = OpenAIEmbedding(model="text-embedding-ada-002")
strmlt.session_state.prompt_helper = PromptHelper(context_window=16384, num_output=2048)
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])
strmlt.session_state.service_context = ServiceContext.from_defaults(
llm=strmlt.session_state.llm,
embed_model=strmlt.session_state.embeddings,
prompt_helper=strmlt.session_state.prompt_helper,
callback_manager=callback_manager
)
set_global_service_context(strmlt.session_state.service_context)
index1 = SummaryIndex(
strmlt.session_state.documents,
#service_context=strmlt.session_state.service_context,
summary_text=strmlt.session_state.summary_text,
response_mode="tree_summarize"
)
index2 = VectorStoreIndex(
strmlt.session_state.documents,
#service_context=strmlt.session_state.service_context,
)
list_query_engine = index1.as_query_engine(response_mode="tree_summarize")
vector_query_engine = index2.as_query_engine(similarity_top_k=4)
list_tool = QueryEngineTool.from_defaults(
query_engine=list_query_engine,
description="Utile pour les questions de synthèse liées à la source de données",
)
vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine,
description="Utile pour retrouver un contexte spécifique lié à la source de données",
)
strmlt.session_state.query_engine = RouterQueryEngine.from_defaults(
query_engine_tools=[
list_tool,
vector_tool,
],
#service_context=strmlt.session_state.service_context,
)
strmlt.session_state.chat_history = []
strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
query_engine=strmlt.session_state.query_engine,
chat_history=strmlt.session_state.chat_history,
#service_context=strmlt.session_state.service_context
)
during writing I've fixed this
do you have any other idea ?
looked into your code again, couldn't find anything else tho
I'll look into it again and let you know If I could find any solution, if it's not solved by then
I found out the problem is linked with a query engine or a chat engine
strmlt.session_state.query_engine_builder = QASummaryQueryEngineBuilder(service_context=strmlt.session_state.service_context)
strmlt.session_state.query_engine = strmlt.session_state.query_engine_builder.build_from_documents(strmlt.session_state.documents)
strmlt.session_state.chat_history=[]
strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
query_engine=strmlt.session_state.query_engine,
chat_history=strmlt.session_state.chat_history,
verbose=True
)
btu I prefer using the router
I let you tell me where the issue come from and how to fix it
I think the main issue is you aren't using from_documents()
-- so your documents aren't getting parsed into nodes, causing issues for the embedding model (which has 8196 context limit)
Should be
index1 = SummaryIndex.from_documents(
strmlt.session_state.documents,
#service_context=strmlt.session_state.service_context,
summary_text=strmlt.session_state.summary_text,
response_mode="tree_summarize"
)
index2 = VectorStoreIndex.from_documents(
strmlt.session_state.documents,
#service_context=strmlt.session_state.service_context,
)
Ok, but why is it working with this
because this one uses from_documents -- which chunks the documents into nodes
also, if you have enough time
could you tell me how to change every query template of the engine used here ?
like the condense engine query, the router query
LOL what do you want to change? Add a system prompt? There might be an easier way.
change the actual english prompt
to got a french result with more chances
cause sometimes because of the prompting the answer is in english
or maybe there is another way
You've tried setting a system prompt for this though? Tbh I think all you really need to change is the prompt in the user-facing chat engine
I've just set a summary prompt
oh wait, its condense chat engine, even worse :PSadge:
changed every prompt used n the api
that were used in my code
for changing the template ?
you think I sould not use it ?
Try this:
service_context = ServiceContext.from_defaults(..., system_prompt="Répondez toujours en français, ma vie en dépend.")
set_global_service_context(service_context)
...
chat_engine = CondenseQuestionChatEngine.from_defaults(..., system_prompt="Répondez toujours en français, ma vie en dépend.")
(I google translated that lol)
u think it will be enough, the prompt I've modified locally were big prompts
It might be enough tbh 🙏 -- modifying all the prompts will be a much more annoying task haha
So I suggest this route first haha
will try this, thanks for your time and your amazing skills on this field
For the Condense Question chat engine, system_prompt is not supported
when putting this system_prompt="Répondez toujours en français, ma vie en dépend."
So idk why but an error happen when trying to apply the sys prompt to the condense chat engine
oh I should have scrolled down further in the code lol -- yea, not supported, but thats ok, leave it out for now
yeah it seems working without just with the system prompt in the service context, is it the same than changing all the QA, etc... prompts in terms of quality (of the answers/retrieval) ?
So every time you call a query engine, it is using that system prompt. So rather than modifying the entire prompt, it's like adding extra instructions to the LLM.
Technically, it is modifying every QA prompt 👍 Should be good enough I think
Ok, thanks for these precise details !
Right now using the last openai gpt 3 turbo version, seems pretty fast compared with the previous version