Hi new to this and struggling with a couple of things. Any help would be very helpfully!
The bot answer the question first time normally good. But the second time I get a answer like this, "In addition to the previously mentioned " + some other text, so i'm guessing it has some kind of memory even that i quit the method (server is still running tho)? If so what keeps track of the previous question? I'm also guessing that will be weird for me at this stage since multiple users will use the same endpoint.
Sometimes it answer the question in english instead of the preferred language Swedish, i have tried to put a instruction before the question but maybe that is the incorrect way to do it?
It can takes up to 40 seconds to get a response, is there any optimization that can be done?
deployment_name = "gpt-35-turbo"
question = "Du är en chatbot som endast svarar på korrekt svenska. Här är frågan: " + req.params.get('question')
llm = AzureChatOpenAI(deployment_name=deployment_name, temperature=0.01, openai_api_base=openai.api_base, openai_api_key=openai.api_key, openai_api_type=openai.api_type, openai_api_version=openai.api_version )
llm_predictor = LLMPredictor(llm=llm)
max_input_size = 4096
num_output = 512
chunk_size_limit = 600
max_chunk_overlap = 20 # overlap for each token fragment
prompt_helper = PromptHelper(max_input_size=max_input_size, num_output=num_output, max_chunk_overlap=max_chunk_overlap, chunk_size_limit=chunk_size_limit)
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor,
prompt_helper=prompt_helper
)
index = GPTSimpleVectorIndex.load_from_disk(os.path.abspath(os.path.join('/index.json')), service_context=service_context)
response = index.query(question)