Find answers from the community

Updated last year

Thread

Attachment
image.png
R
R
L
77 comments
make sure you're passing the correct service context with the correct model.

You can also try setting the service context globally
Plain Text
from llama_index import set_global_service_context

set_global_service_context(service_context)
Ok, trying this rn
currently doing this:
Plain Text
        strmlt.session_state.documents = SimpleDirectoryReader(folder).load_data()
        strmlt.session_state.llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo-1106", max_tokens=2048)
        strmlt.session_state.embeddings = OpenAIEmbedding(model="text-embedding-ada-002")
        strmlt.session_state.prompt_helper = PromptHelper(context_window=16384, num_output=2048)
        llama_debug = LlamaDebugHandler(print_trace_on_end=True)
        callback_manager = CallbackManager([llama_debug])
        strmlt.session_state.service_context = ServiceContext.from_defaults(
            llm=strmlt.session_state.llm,
            embed_model=strmlt.session_state.embeddings,
            prompt_helper=strmlt.session_state.prompt_helper,
            callback_manager=callback_manager
        )
        set_global_service_context(service_context)
        index1 = SummaryIndex(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
            summary_text=strmlt.session_state.summary_text,
            response_mode="tree_summarize"
        )

        index2 = VectorStoreIndex(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
        )

        list_query_engine = index1.as_query_engine(response_mode="tree_summarize")
        vector_query_engine = index2.as_query_engine(similarity_top_k=4)

        list_tool = QueryEngineTool.from_defaults(
            query_engine=list_query_engine,
            description="Utile pour les questions de synthèse liées à la source de données",
        )
        vector_tool = QueryEngineTool.from_defaults(
            query_engine=vector_query_engine,
            description="Utile pour retrouver un contexte spécifique lié à la source de données",
        )

        strmlt.session_state.query_engine = RouterQueryEngine.from_defaults(
            query_engine_tools=[
                list_tool,
                vector_tool,
            ],
            #service_context=strmlt.session_state.service_context,
        )

        strmlt.session_state.chat_history = []
        strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
            query_engine=strmlt.session_state.query_engine,
            chat_history=strmlt.session_state.chat_history,
            #service_context=strmlt.session_state.service_context
        )
still the same error after setting the service context globally?
message': "This model's maximum context length is 8192 tokens, however you requested 10562 tokens (10562 in your prompt; 0 for the completion).
here is what I have, again
weird, only 8k model in that service context is the embedding model, but the text is chunked before passing to the embedding model
is it chunked correctly ?
in your code above is the service_context variable defined before?
Attachment
image.png
else I would have an error
yeah you don't have the latest one I undertsand why u say this
here is:
Plain Text
strmlt.session_state.documents = SimpleDirectoryReader(folder).load_data()
        strmlt.session_state.llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo-16k", max_tokens=2048)
        strmlt.session_state.embeddings = OpenAIEmbedding(model="text-embedding-ada-002")
        strmlt.session_state.prompt_helper = PromptHelper(context_window=16384, num_output=2048)
        llama_debug = LlamaDebugHandler(print_trace_on_end=True)
        callback_manager = CallbackManager([llama_debug])
        strmlt.session_state.service_context = ServiceContext.from_defaults(
            llm=strmlt.session_state.llm,
            embed_model=strmlt.session_state.embeddings,
            prompt_helper=strmlt.session_state.prompt_helper,
            callback_manager=callback_manager
        )
        set_global_service_context(strmlt.session_state.service_context)
        index1 = SummaryIndex(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
            summary_text=strmlt.session_state.summary_text,
            response_mode="tree_summarize"
        )

        index2 = VectorStoreIndex(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
        )

        list_query_engine = index1.as_query_engine(response_mode="tree_summarize")
        vector_query_engine = index2.as_query_engine(similarity_top_k=4)

        list_tool = QueryEngineTool.from_defaults(
            query_engine=list_query_engine,
            description="Utile pour les questions de synthèse liées à la source de données",
        )
        vector_tool = QueryEngineTool.from_defaults(
            query_engine=vector_query_engine,
            description="Utile pour retrouver un contexte spécifique lié à la source de données",
        )

        strmlt.session_state.query_engine = RouterQueryEngine.from_defaults(
            query_engine_tools=[
                list_tool,
                vector_tool,
            ],
            #service_context=strmlt.session_state.service_context,
        )

        strmlt.session_state.chat_history = []
        strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
            query_engine=strmlt.session_state.query_engine,
            chat_history=strmlt.session_state.chat_history,
            #service_context=strmlt.session_state.service_context
        )
during writing I've fixed this
so it's not our issue
do you have any other idea ?
looked into your code again, couldn't find anything else tho
I'll look into it again and let you know If I could find any solution, if it's not solved by then
I found out the problem is linked with a query engine or a chat engine
indeed, when using this:
strmlt.session_state.query_engine_builder = QASummaryQueryEngineBuilder(service_context=strmlt.session_state.service_context)
strmlt.session_state.query_engine = strmlt.session_state.query_engine_builder.build_from_documents(strmlt.session_state.documents)
strmlt.session_state.chat_history=[]
strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
query_engine=strmlt.session_state.query_engine,
chat_history=strmlt.session_state.chat_history,
verbose=True
)
it works well
btu I prefer using the router
I let you tell me where the issue come from and how to fix it
I think the main issue is you aren't using from_documents() -- so your documents aren't getting parsed into nodes, causing issues for the embedding model (which has 8196 context limit)
Should be

Plain Text
        index1 = SummaryIndex.from_documents(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
            summary_text=strmlt.session_state.summary_text,
            response_mode="tree_summarize"
        )

        index2 = VectorStoreIndex.from_documents(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
        )
Ok, but why is it working with this
and not this
it's really weird
because this one uses from_documents -- which chunks the documents into nodes
thanks for the answer
also, if you have enough time
could you tell me how to change every query template of the engine used here ?
like the condense engine query, the router query
LOL what do you want to change? Add a system prompt? There might be an easier way.
change the actual english prompt
to got a french result with more chances
cause sometimes because of the prompting the answer is in english
or maybe there is another way
You've tried setting a system prompt for this though? Tbh I think all you really need to change is the prompt in the user-facing chat engine
I've just set a summary prompt
oh wait, its condense chat engine, even worse :PSadge:
and on my local machine
changed every prompt used n the api
that were used in my code
seemed to work well
for changing the template ?
you think I sould not use it ?
Try this:

Plain Text
service_context = ServiceContext.from_defaults(..., system_prompt="Répondez toujours en français, ma vie en dépend.")
set_global_service_context(service_context)
...
chat_engine = CondenseQuestionChatEngine.from_defaults(..., system_prompt="Répondez toujours en français, ma vie en dépend.")
(I google translated that lol)
u think it will be enough, the prompt I've modified locally were big prompts
It might be enough tbh 🙏 -- modifying all the prompts will be a much more annoying task haha
So I suggest this route first haha
understandable lol
will try this, thanks for your time and your amazing skills on this field
For the Condense Question chat engine, system_prompt is not supported
when putting this system_prompt="Répondez toujours en français, ma vie en dépend."
it says that
So idk why but an error happen when trying to apply the sys prompt to the condense chat engine
oh I should have scrolled down further in the code lol -- yea, not supported, but thats ok, leave it out for now
yeah it seems working without just with the system prompt in the service context, is it the same than changing all the QA, etc... prompts in terms of quality (of the answers/retrieval) ?
So every time you call a query engine, it is using that system prompt. So rather than modifying the entire prompt, it's like adding extra instructions to the LLM.

Technically, it is modifying every QA prompt 👍 Should be good enough I think
Ok, thanks for these precise details !
Right now using the last openai gpt 3 turbo version, seems pretty fast compared with the previous version
Add a reply
Sign up and join the conversation on Discord