This is the
context_prompt
DEFAULT_CONTEXT_PROMPT_TEMPLATE = """
The following is a friendly conversation between a user and an AI assistant.
The assistant is talkative and provides lots of specific details from its context.
If the assistant does not know the answer to a question, it truthfully says it
does not know.
Here are the relevant documents for the context:
{context_str}
Instruction: Based on the above documents, provide a detailed answer for the user question below.
Answer "don't know" if not present in the document.
"""
You can modify this and pass it in under the
context_prompt
kwarg
This is my code:
query_engine = index.as_chat_engine(chat_mode='condense_plus_context',
similarity_top_k=similarity_top_k,
llm=llm_engine,
system_prompt=prepared_system_prompt,
memory=chatmemory,
node_postprocessors=[CustomPostprocessor(
context_limit, query_text + prepared_system_prompt, include_threshold)])
My system_prompt is already my context prompt, should I just pass the context_prompt with the same value to rewrite the default context prompt?
Okay, I finally figured it out. But I see another a new problem: for some reason, index (or whatever) estimates the node score pretty wrong: it may give a score 0.3 to a very relevant node, and 0.25 to an absolutely irrelevant one. Is there any way to debug/control it? Thank you!!
Nope -- this is completely up to the embedding model You are using
And tbh, the "score" is also relative to the model you are using. For some models, 0.3 might be a good score tbh.
I know for openai typically, anything below 0.77ish is probably not relevant
This is how I index the data:
embed_model = OpenAIEmbedding(mode='similarity', embed_batch_size=2000,
api_key=user_settings_data.item.get('openai_key'))
In my case, I saw a node with score 0.26 that was 100% irrelevant and 0.3 very relevant. It order them right, though, but the numbers may be crazy wrong. I use always the same model for embedding (I guess it's "ada") but now have pretty unstable results.
Here is the example:
The question: what is the capital of England
The most relevant node has this text "(em inglês) Construa sua loja com as APIs mais poderosas da Shopify Plus Uma solução de comércio para marcas digitais em expansão. Todos os produtos Conheça todos os produtos e recursos da Shopify Preços Recursos Ajuda e atendimento Ajuda e atendimento Conte com nosso atendimento ao cliente Blog da Shopify Dicas de estratégia comercial Tópicos mais acessados O que é a Shopify? Como nossa plataforma de comércio funciona? Shopify Editions Novos e inovadores produtos Shopify Histórias de fundadores e fundadoras Aprenda com lojistas de sucesso Branding Crie sua marca do zero Marketing Crie um plano de marketing SEO para e-commerce Melhore seu posicionamento nas buscas Estratégia para redes sociais Transforme seguidores em clientes Crescimento de negócios Amplie seu negócio Ferramentas essenciais Gerador de nomes para empresas Criador de logos Banco de imagens Modelo de plano de negócios"
Its score is 0.25.
How, on the Earth, it could have this score??
UPD. I'm trying to change the chat_mode to "context" from "condense_and_context", let's see
Are you using a vector db? If so, which one?
Yes, it's Postgres.
After changing the chat_mode the score became 0.27 😦
Postgres is likely returning distance, not score (so lower is better)
Yes, exactly, so the irrelevant node should not have such a good score like 0.26. And before it was working fine. I just found this today and I felt terrible
Interesting enough, If I calculate it manually, I get a cosine similarity score (not distance) that feels similar
>>> from llama_index.core.base.embeddings.base import similarity
>>> from llama_index.embeddings.openai import OpenAIEmbedding
>>> embed_model = OpenAIEmbedding()
>>> text1 = "what is the capital of England?"
>>> text2 = """(em inglês) Construa sua loja com as APIs mais poderosas da Shopify Plus Uma solução de comércio para marcas digitais em expansão. Todos os produtos Conheça todos os produtos e recursos da Shopify Preços Recursos Ajuda e atendimento Ajuda e atendimento Conte com nosso atendimento ao cliente Blog da Shopify Dicas de estratégia comercial Tópicos mais acessados O que é a Shopify? Como nossa plataforma de comércio funciona? Shopify Editions Novos e inovadores produtos Shopify Histórias de fundadores e fundadoras Aprenda com lojistas de sucesso Branding Crie sua marca do zero Marketing Crie um plano de marketing SEO para e-commerce Melhore seu posicionamento nas buscas Estratégia para redes sociais Transforme seguidores em clientes Crescimento de negócios Amplie seu negócio Ferramentas essenciais Gerador de nomes para empresas Criador de logos Banco de imagens Modelo de plano de negócios"""
>>> embed1 = embed_model.get_query_embedding(text1)
>>> embed2 = embed_model.get_text_embedding(text2)
>>> similarity(embed1, embed2)
0.7057110096512037
>>>
Vectors aren't perfect. This is where things like reranking can be important
Thanks for this investigation. So, do I understand properly that your test shows a higher level of similarity of these 2 texts?
0.7 -- not exactly high for openai, but not low either
I'm sorry I'm confused. For postgres, the low, the more relevant. And in your test it is... ?
the test above uses cosine similarity -- higher the better
therefore, for this specific piece of text, the relevancy is calculated wrong whatever approach to use, right? Do you have any guesses, why?
The relevancy is not calculated wrong
We cant change what the embeddings say
My best guess is the multilingual aspect might be causing some issues
So, do you think 0.7 relevancy is not bad for this case? But why, the text is not relevant to the question at all
Its a pretty neutral score from openai
like could be relevant, could not
I think the multilingual stuff is probably causing the issue though. If both were english (or both spanish), it might be different
Ah, okay, this is interesting. I will dig into it more. Thanks for your help!!