can't make it work either, i get returned a 500 error
query_text = data.get("Prompt")
query_engine = index.as_chat_engine(similarity_top_k=3, text_qa_template=qa_template)
response = query_engine.chat(query_text)
I mght need to give him a chat history and pass it in parameter ?
File "c:\Projets\IA Chat Local\Sources\AzureOpenAI\app.py", line 91, in get_json
response = query_engine.chat(query_text)
No, It prepares the chat history by itself. If not already present
Can you share the entire error
This seems more like OpenAI error
Try passing the service context in the chat engine once
llm_predictor = LLMPredictor(llm=ChatOpenAI(openai_api_key="YOUR_API_KEY",temperature=0, max_tokens=1024, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=512)
Try with this once
just passed the service context in query_engine = index.as_chat_engine(similarity_top_k=3, text_qa_template=qa_template, service_context=service_context)
However it seems a little shaky with a chat history
Just a note, you can also set a global service context so that you don't have to worry about passing it in everywhere
from llama_index import set_global_service_context
set_global_service_context(service_context)
<continue with program>
nope only via the setting chat_engine
but when i ask him questions like "what the question i just asked you before" i tells me that he doesnt know
Hmm, I think that's a sympotom of how the condense engine works
We probably need a more simple implementation that doesn't re-phrase the query every time
Every input it uses the chat history to re-write the user query, and uses that to search
the react-agent is probably more along the lines of what you want
but thats basically just langchain
Yes it will not be able to tell you these questions
Actually the way condense mode works is
There are two llm calls.
First one forms the questions based on user query which is asked to our indexes.
So if you ask what was the last question, it will pick the last question from the chat history but that question will be used in the second llm call thus you'll get the response based on the last question ans not the actual question
What bothered me with the react engine is (i guess) that he have the possibility to chose beteween the index and his own knowledge. Since I want it to be restrained to only my vector index, I think this solution can't be exploited...
At the end I just want to track the chat history and put it in my json file. It's just a chatbot.. If he can query over last questions it's good but the main thing i'm working on rn is to keep track of that chat history then giving my engine the template and the chat history (which i don't know how to implement it yet)
Found out how to format my output in json
need to implement the history now !