Find answers from the community

Updated 3 months ago

Previous answer

Hi new to this and struggling with a couple of things. Any help would be very helpfully!

The bot answer the question first time normally good. But the second time I get a answer like this, "In addition to the previously mentioned " + some other text, so i'm guessing it has some kind of memory even that i quit the method (server is still running tho)? If so what keeps track of the previous question? I'm also guessing that will be weird for me at this stage since multiple users will use the same endpoint.

Sometimes it answer the question in english instead of the preferred language Swedish, i have tried to put a instruction before the question but maybe that is the incorrect way to do it?

It can takes up to 40 seconds to get a response, is there any optimization that can be done?

Plain Text
 deployment_name = "gpt-35-turbo"
question = "Du är en chatbot som endast svarar på korrekt svenska. Här är frågan: " + req.params.get('question')

    llm = AzureChatOpenAI(deployment_name=deployment_name, temperature=0.01, openai_api_base=openai.api_base, openai_api_key=openai.api_key, openai_api_type=openai.api_type,          openai_api_version=openai.api_version )
    llm_predictor = LLMPredictor(llm=llm)

    max_input_size = 4096 
    num_output = 512
    chunk_size_limit = 600 
    max_chunk_overlap = 20 # overlap for each token fragment
    prompt_helper = PromptHelper(max_input_size=max_input_size, num_output=num_output, max_chunk_overlap=max_chunk_overlap, chunk_size_limit=chunk_size_limit)
    
    service_context = ServiceContext.from_defaults(
        llm_predictor=llm_predictor,
        prompt_helper=prompt_helper
    )

    index = GPTSimpleVectorIndex.load_from_disk(os.path.abspath(os.path.join('/index.json')), service_context=service_context)

    response = index.query(question)
L
S
4 comments
Actually, it is not tracking any memory!

But, there is a specific process in llama index called answer refinement. Where if all the text retrieved to answer a question does not fit into a single LLM call, it refines an answer across many calls.

The mentions of previous answers are sort of leaks from the prompt templates. Especially recently with gpt 3.5, this process can be difficult

I've actually been working on a new template, let me find that, you can try it out 😅
As for speed, it's mostly dependent on how many LLM calls are made, and how busy openai servers are 🫠
Here's the hopefully better template!

Plain Text
from langchain.prompts.chat import (
    AIMessagePromptTemplate,
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)

from llama_index.prompts.prompts import RefinePrompt

# Refine Prompt
CHAT_REFINE_PROMPT_TMPL_MSGS = [
    HumanMessagePromptTemplate.from_template("{query_str}"),
    AIMessagePromptTemplate.from_template("{existing_answer}"),
    HumanMessagePromptTemplate.from_template(
        "I have more context below which can be used "
        "(only if needed) to update your previous answer.\n"
        "------------\n"
        "{context_msg}\n"
        "------------\n"
        "Given the new context, update the previous answer to better "
        "answer my previous query."
        "If the previous answer remains the same, repeat it verbatim. "
        "Never reference the new context or my previous query directly.",
    ),
]


CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
...
index.query("my query", similarity_top_k=3, refine_template=CHAT_REFINE_PROMPT)
Thanks alot Logan for the answer an info! Will test it out 🙂
Add a reply
Sign up and join the conversation on Discord