Find answers from the community

Updated 2 years ago

You can do it directly now

At a glance
You can do it directly now

chat_engine = index.as_chat_engine()
default value is CondenseQuestion mode only
V
W
L
28 comments
can't make it work either, i get returned a 500 error
query_text = data.get("Prompt") query_engine = index.as_chat_engine(similarity_top_k=3, text_qa_template=qa_template) response = query_engine.chat(query_text)
I mght need to give him a chat history and pass it in parameter ?
File "c:\Projets\IA Chat Local\Sources\AzureOpenAI\app.py", line 91, in get_json response = query_engine.chat(query_text)
No, It prepares the chat history by itself. If not already present
Can you share the entire error
This seems more like OpenAI error
Try passing the service context in the chat engine once
Plain Text
llm_predictor = LLMPredictor(llm=ChatOpenAI(openai_api_key="YOUR_API_KEY",temperature=0, max_tokens=1024, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)

Try with this once
just passed the service context in query_engine = index.as_chat_engine(similarity_top_k=3, text_qa_template=qa_template, service_context=service_context)
and that maked it work
However it seems a little shaky with a chat history
Just a note, you can also set a global service context so that you don't have to worry about passing it in everywhere

Plain Text
from llama_index import set_global_service_context

set_global_service_context(service_context)

<continue with program>
You passing it ?
nope only via the setting chat_engine
but when i ask him questions like "what the question i just asked you before" i tells me that he doesnt know
He should know no ?
Hmm, I think that's a sympotom of how the condense engine works
We probably need a more simple implementation that doesn't re-phrase the query every time
Every input it uses the chat history to re-write the user query, and uses that to search
the react-agent is probably more along the lines of what you want
but thats basically just langchain
Yes it will not be able to tell you these questions
Actually the way condense mode works is

There are two llm calls.
First one forms the questions based on user query which is asked to our indexes.

So if you ask what was the last question, it will pick the last question from the chat history but that question will be used in the second llm call thus you'll get the response based on the last question ans not the actual question
What bothered me with the react engine is (i guess) that he have the possibility to chose beteween the index and his own knowledge. Since I want it to be restrained to only my vector index, I think this solution can't be exploited...
At the end I just want to track the chat history and put it in my json file. It's just a chatbot.. If he can query over last questions it's good but the main thing i'm working on rn is to keep track of that chat history then giving my engine the template and the chat history (which i don't know how to implement it yet)
Found out how to format my output in json
need to implement the history now !
Add a reply
Sign up and join the conversation on Discord