LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

Thread

Thread

At a glance

The community members are discussing issues with setting up a service context and query engines for a natural language processing application. They try various approaches, including setting the service context globally and using different query engine builders. The main issue seems to be related to the maximum context length of the embedding model, which is causing problems when processing the input data. The solution appears to be using the from_documents() method to properly chunk the input data into nodes before passing it to the query engines. Additionally, the community members discuss changing the prompting to get French language results instead of English.

Useful resources

·

Attachment

R

R

L

77 comments

make sure you're passing the correct service context with the correct model.

You can also try setting the service context globally

Plain Text

from llama_index import set_global_service_context

set_global_service_context(service_context)

Ok, trying this rn

currently doing this:

Plain Text

        strmlt.session_state.documents = SimpleDirectoryReader(folder).load_data()
        strmlt.session_state.llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo-1106", max_tokens=2048)
        strmlt.session_state.embeddings = OpenAIEmbedding(model="text-embedding-ada-002")
        strmlt.session_state.prompt_helper = PromptHelper(context_window=16384, num_output=2048)
        llama_debug = LlamaDebugHandler(print_trace_on_end=True)
        callback_manager = CallbackManager([llama_debug])
        strmlt.session_state.service_context = ServiceContext.from_defaults(
            llm=strmlt.session_state.llm,
            embed_model=strmlt.session_state.embeddings,
            prompt_helper=strmlt.session_state.prompt_helper,
            callback_manager=callback_manager
        )
        set_global_service_context(service_context)
        index1 = SummaryIndex(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
            summary_text=strmlt.session_state.summary_text,
            response_mode="tree_summarize"
        )

        index2 = VectorStoreIndex(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
        )

        list_query_engine = index1.as_query_engine(response_mode="tree_summarize")
        vector_query_engine = index2.as_query_engine(similarity_top_k=4)

        list_tool = QueryEngineTool.from_defaults(
            query_engine=list_query_engine,
            description="Utile pour les questions de synthèse liées à la source de données",
        )
        vector_tool = QueryEngineTool.from_defaults(
            query_engine=vector_query_engine,
            description="Utile pour retrouver un contexte spécifique lié à la source de données",
        )

        strmlt.session_state.query_engine = RouterQueryEngine.from_defaults(
            query_engine_tools=[
                list_tool,
                vector_tool,
            ],
            #service_context=strmlt.session_state.service_context,
        )

        strmlt.session_state.chat_history = []
        strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
            query_engine=strmlt.session_state.query_engine,
            chat_history=strmlt.session_state.chat_history,
            #service_context=strmlt.session_state.service_context
        )

still the same error after setting the service context globally?

testing rn

message': "This model's maximum context length is 8192 tokens, however you requested 10562 tokens (10562 in your prompt; 0 for the completion).

here is what I have, again

weird, only 8k model in that service context is the embedding model, but the text is chunked before passing to the embedding model

is it chunked correctly ?

in your code above is the service_context variable defined before?

Attachment

yes

else I would have an error

true

yeah you don't have the latest one I undertsand why u say this

here is:

Plain Text

strmlt.session_state.documents = SimpleDirectoryReader(folder).load_data()
        strmlt.session_state.llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo-16k", max_tokens=2048)
        strmlt.session_state.embeddings = OpenAIEmbedding(model="text-embedding-ada-002")
        strmlt.session_state.prompt_helper = PromptHelper(context_window=16384, num_output=2048)
        llama_debug = LlamaDebugHandler(print_trace_on_end=True)
        callback_manager = CallbackManager([llama_debug])
        strmlt.session_state.service_context = ServiceContext.from_defaults(
            llm=strmlt.session_state.llm,
            embed_model=strmlt.session_state.embeddings,
            prompt_helper=strmlt.session_state.prompt_helper,
            callback_manager=callback_manager
        )
        set_global_service_context(strmlt.session_state.service_context)
        index1 = SummaryIndex(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
            summary_text=strmlt.session_state.summary_text,
            response_mode="tree_summarize"
        )

        index2 = VectorStoreIndex(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
        )

        list_query_engine = index1.as_query_engine(response_mode="tree_summarize")
        vector_query_engine = index2.as_query_engine(similarity_top_k=4)

        list_tool = QueryEngineTool.from_defaults(
            query_engine=list_query_engine,
            description="Utile pour les questions de synthèse liées à la source de données",
        )
        vector_tool = QueryEngineTool.from_defaults(
            query_engine=vector_query_engine,
            description="Utile pour retrouver un contexte spécifique lié à la source de données",
        )

        strmlt.session_state.query_engine = RouterQueryEngine.from_defaults(
            query_engine_tools=[
                list_tool,
                vector_tool,
            ],
            #service_context=strmlt.session_state.service_context,
        )

        strmlt.session_state.chat_history = []
        strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
            query_engine=strmlt.session_state.query_engine,
            chat_history=strmlt.session_state.chat_history,
            #service_context=strmlt.session_state.service_context
        )

during writing I've fixed this

so it's not our issue

do you have any other idea ?

looked into your code again, couldn't find anything else tho

I'll look into it again and let you know If I could find any solution, if it's not solved by then

ok thanks

I found out the problem is linked with a query engine or a chat engine

indeed, when using this:

strmlt.session_state.query_engine_builder = QASummaryQueryEngineBuilder(service_context=strmlt.session_state.service_context)
strmlt.session_state.query_engine = strmlt.session_state.query_engine_builder.build_from_documents(strmlt.session_state.documents)
strmlt.session_state.chat_history=[]
strmlt.session_state.conversation = CondenseQuestionChatEngine.from_defaults(
query_engine=strmlt.session_state.query_engine,
chat_history=strmlt.session_state.chat_history,
verbose=True
)

it works well

btu I prefer using the router

I let you tell me where the issue come from and how to fix it

I think the main issue is you aren't using from_documents() -- so your documents aren't getting parsed into nodes, causing issues for the embedding model (which has 8196 context limit)

Should be

Plain Text

        index1 = SummaryIndex.from_documents(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
            summary_text=strmlt.session_state.summary_text,
            response_mode="tree_summarize"
        )

        index2 = VectorStoreIndex.from_documents(
            strmlt.session_state.documents,
            #service_context=strmlt.session_state.service_context,
        )

Ok, but why is it working with this

and not this

it's really weird

because this one uses from_documents -- which chunks the documents into nodes

ow ok

thanks for the answer

also, if you have enough time

could you tell me how to change every query template of the engine used here ?

like the condense engine query, the router query

etc

LOL what do you want to change? Add a system prompt? There might be an easier way.

no

change the actual english prompt

in french

to got a french result with more chances

cause sometimes because of the prompting the answer is in english

or maybe there is another way

You've tried setting a system prompt for this though? Tbh I think all you really need to change is the prompt in the user-facing chat engine

I've just set a summary prompt

that's all

oh wait, its condense chat engine, even worse :PSadge:

and on my local machine

changed every prompt used n the api

that were used in my code

seemed to work well

for changing the template ?

you think I sould not use it ?

Try this:

Plain Text

service_context = ServiceContext.from_defaults(..., system_prompt="Répondez toujours en français, ma vie en dépend.")
set_global_service_context(service_context)
...
chat_engine = CondenseQuestionChatEngine.from_defaults(..., system_prompt="Répondez toujours en français, ma vie en dépend.")

(I google translated that lol)

LOL

thx

lov it

u think it will be enough, the prompt I've modified locally were big prompts

It might be enough tbh 🙏 -- modifying all the prompts will be a much more annoying task haha

So I suggest this route first haha

understandable lol

will try this, thanks for your time and your amazing skills on this field

For the Condense Question chat engine, system_prompt is not supported

Sure it is 👀
https://github.com/run-llama/llama_index/blob/88cc419218a6be3f86ebcd3c5b15fa3566aba9c3/llama_index/chat_engine/condense_question.py#L75

when putting this system_prompt="Répondez toujours en français, ma vie en dépend."

it says that

So idk why but an error happen when trying to apply the sys prompt to the condense chat engine

oh I should have scrolled down further in the code lol -- yea, not supported, but thats ok, leave it out for now

yeah it seems working without just with the system prompt in the service context, is it the same than changing all the QA, etc... prompts in terms of quality (of the answers/retrieval) ?

So every time you call a query engine, it is using that system prompt. So rather than modifying the entire prompt, it's like adding extra instructions to the LLM.

Technically, it is modifying every QA prompt 👍 Should be good enough I think

Ok, thanks for these precise details !

Right now using the last openai gpt 3 turbo version, seems pretty fast compared with the previous version

Add a reply

Sign up and join the conversation on Discord