Hi

At a glance

Hi,
I'm trying to do QA on documents.
Those documents correspond to different countries, and each of them are divided in chapters (same chapters for each of the countries).
I followed https://gpt-index.readthedocs.io/en/latest/guides/building_a_chatbot.html and successfully converted it to my use case.
I'm therefore using

a dictionary of GPTSimpleVectorIndex (one per country)
a GPTListIndex of each of those GPTSimpleVectorIndex

And Langchain is deciding which tool to use (GPTSimpleVectorIndex for a specific country, and GPTListIndex if I want to compare countries)
I still have some problems. Among them, even after selecting the right tool (country), sometimes the right context will not be selected in this country and answer will be false.

Would it be possible to compose the indices differently to get better results?
By exemple, having :

a GPTSimpleVectorIndex for each section/chapter (of each country)
a GPTSimpleKeywordTableIndex per country, made of the GPTSimpleVectorIndex used for the sections
a GPTListIndex made of the GPTSimpleKeywordTableIndex(es) and putting together all the countries

And then correctly defining the toolchain with those different indices.

Thank you

21 comments

jjerryjliu0

Hi @iraadit ! this sound like a really interesting use case, would love to help you with this.

When you say "even after selecting the right tool (country), sometimes the right context will not be selected in this country and answer will be false," what types of questions are you asking? are they questions about a specific country or comparing between diff countries?

iiraadit

Thank your for your answer
When I'm asking the question for a specific country, I generally get a correct answer
However, when comparing, it seems the right context is sometimes not provided (and it seems it can even hallucinate or generate a wrong answer from the wrong context)

iiraadit

As an example, documents could be separated as such:

Belgium

===> History
===> Government and politics
===> Geography
===> ...

France

===> History
===> Government and politics
===> Geography
===> ...

Germany

===> History
===> Government and politics
===> Geography
===> ...

I would like to make sure that when asking as question such as :

Compare the size of the countries => it could infer that for each of the countries it should look into "Geography" and no other section (through DecomposeQueryTransform)
When were the countries created, present it as a table => it could infer that it has to look into "History" and no other section (through DecomposeQueryTransform)
...

iiraadit

For now, I have effectively separated each of the sections for each of the countries, but all the sections are together in a GPTSimpleVectorIndex, and the embedding don't always match
It could even happen that for some countries, there would be no information for a specific section
It should then retrieve this section, see that it is empty and answer it doesn't know the answer from the provided context
For now, if a section is empty, it would retrieve another section that is closer to the embedding, and possibly answer something wrong

iiraadit

Depending on how the question would be asked, it would also be possible that the embedding of another section would be closer to the ones of the question.
It would be nice to be able to "set_text" for each section, to explain what the section is about, so that the right section would be selected, without even considering the embeddings of its content

jjerryjliu0

thanks @iraadit ! This is super helpful information, and a super important use case. Just reading your comments I have a few thoughts:

For compare/contrast queries I was about to suggest taking a look at using DecomposeQueryTransform but seems like you're already doing that 🙂 Just making sure you've also taken a look at the corresponding notebook: https://github.com/jerryjliu/llama_index/blob/main/examples/composable_indices/city_analysis/City_Analysis-Decompose.ipynb
In general I think a list index over subindexes + decompose query transform makes sense for compare/contrast queries. It's interesting that you weren't able to find the right results.
Regarding your last point on being able to "set_text" for each section, you could try feeding each section as its own Document object into the corresponding vector index for that country. Within a Document object you can not only set the text, but also an extra_info dictionary of metadata. By default this metadata is injected into each text chunk derived from the document, so you can use this to inject global context about the section!
The flow you described: "Compare the size of the countries => it could infer that for each of the countries it should look into "Geography" and no other section (through DecomposeQueryTransform)" sounds right to me. In practice is this not happening?

iiraadit

Hi @jerryjliu0, thank you for your detailed answer.

I use DecomposeQueryTransform as presented in the notebook example
For now, I only have 5 countries for my tests. However, if I want to compare for 3 specific countries, it will use the Graph Index and make calls for countries for which the answer is not desired. Final response is (generally, because I already saw it fail) not including information about those other countries. But let's say I add all the countries of the world, I don't want to make LLM calls for all of them while I just wanted a comparison between a subset of them. Would therefore GPTSimpleKeywordTableIndex (or other) make more sense?
I have been able to add a "section" metadata, thanks!
No, in practice, it would sometimes select other contexts than the one containing the answer. I found a way to alleviate the problem by changing "similarity_top_k" from 1 to 3, effectively providing more context. I have examples of questions that got answered successfully with the 3 for similarity_top_k, but wrongly with 1. The inconvenient as that then the call is becoming WAY slower! Going from 25sec to 6min10sec.

Plain Text

# define query configs for graph 
# "similarity_top_k": 3, # or 1
query_configs = [
    {
        "index_struct_type": "simple_dict",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 3,
            "verbose": True
            # "include_summary": True
        },
        "query_transform": decompose_transform
    },
    {
        "index_struct_type": "list",
        "query_mode": "default",
        "query_kwargs": {
            "response_mode": "tree_summarize",
            "verbose": True
        }
    },
]

jjerryjliu0

got it. regarding increasing similarity_top_k, try the following additional "query_kwargs":

Plain Text

use_async=True,
response_mode="tree_summarize"

iiraadit

I got this error: TypeError: type object got multiple values for keyword argument 'use_async'

iiraadit

Plain Text

`
# define query configs for graph 
# "similarity_top_k": 3, # or 1
query_configs = [
    {
        "index_struct_type": "simple_dict",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 3,
            "verbose": True,
            "use_async": True,
            "response_mode": "tree_summarize"
            # "include_summary": True
        },
        "query_transform": decompose_transform
    },
    {
        "index_struct_type": "list",
        "query_mode": "default",
        "query_kwargs": {
            "response_mode": "tree_summarize",
            "verbose": True
        }
    },
]

iiraadit

I tried

Plain Text

`
query_configs = [
    {
        "index_struct_type": "simple_dict",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 3,
            "verbose": True,
            "use_async": True,
            "response_mode": "tree_summarize"
            # "include_summary": True
        },
        "query_transform": decompose_transform
    },
    {
        "index_struct_type": "list",
        "query_mode": "default",
        "query_kwargs": {
            "response_mode": "tree_summarize",
            "verbose": True
        }
    },
]

`
As well as

Plain Text

`
query_configs = [
    {
        "index_struct_type": "simple_dict",
        "query_mode": "default",
        "query_kwargs": {
            "similarity_top_k": 3,
            "verbose": True,
            "use_async": True,
            "response_mode": "tree_summarize"
            # "include_summary": True
        },
        "query_transform": decompose_transform
    },
    {
        "index_struct_type": "list",
        "query_mode": "default",
        "query_kwargs": {
            "response_mode": "tree_summarize",
            "use_async": True,
            "verbose": True
        }
    },
]

The two are failing with joined error message

iiraadit

Another connected problem I have:
When using llm=OpenAI(temperature=0) and create_llama_chat_agent, the tool selection will sometimes not work correctly.

Indeed, for a question such as""Compare/contrast the conditions applicable to Online Privacy in Italy and Spain.", I will get:

Plain Text

`
Thought: Do I need to use a tool? Yes
Action: Vector Index Italy
Action Input: Online Privacy

But then no Action on Spain and the final answer is only regarding Italy

I also sometimes have the following behavior instead:
For a question such as ""Compare/contrast the age for child's consent in Germany and in Belgium.", I will get:

Plain Text

`
Thought: Do I need to use a tool? Yes
Action: Vector Index Germany and Vector Index Belgium
Action Input: age for child's consent
Observation: Vector Index Germany and Vector Index Belgium is not a valid tool, try another one.

And no Tool will be used; because it is not getting it should use Vector Index Germany and then Vector Index Belgium; but think that "Vector Index Germany and Vector Index Belgium" is only one tool that it can't find

iiraadit

For the same questions, when using llm=ChatOpenAI(temperature=0, model_name="gpt-4", max_tokens=512) and chat-conversational-react-description, it will give:

For a question such as ""Compare/contrast the age for child's consent in Germany and in Belgium.", I will get:

Plain Text

{
    "action": "Graph Index",
    "action_input": "Compare/contrast the age for child's consent in Germany and in Belgium"
}
> Current query: Compare/contrast the age for child's consent in Germany and in Belgium?
> New query: What is the age for child's consent in Belgium?
...
> New query: What is the age for child's consent in Germany?
...
> New query: What is the age for child's consent in France?
...
> New query: What is the age for child's consent in Italy?
...
> New query: What is the age for child's consent in Spain?
...

While it would make more sense if it was only checking for Belgium and Germany, as asked

iiraadit

Hi @jerryjliu0
I now have a mix between a regression and something that does its job better:
I drew inspirarion from this notebook: https://github.com/jerryjliu/llama_index/blob/main/examples/composable_indices/city_analysis/City_Analysis-Decompose.ipynb and use only LlamaIndex, without langchain
However, even if the answer will now only be grounded in the context, it means I don’t have a real chat behavior anymore, and some other changes

to handle chat with history, I was going through langchain (index created with llamaIndex injected into langchain); but if it gave good results, it's because when the info was not in the context (documents), it supplemented with what it already knew (chatgpt knowledge) For example, for the question about ages of consent, the only country for which it is in the documents context, it was Belgium with 13 years; for the other countries, it used his prior knowledge (in normal chatgpt mode, so not safe from a hallucination sometimes; once it would give a correct answer, next time an incorrect answer). While my goal is to only have correct answers, always grounded in context, and to say that there is not enough information in the context of the info is not in it
again I only go through LlamaIndex for the moment: if information is not in the document(s), it says so: "I don't know how to answer this question with the context provided"
Now really only answer with context provided, presented in a chat window, however doesn't behave like a chat (can't ask "what was the previous question?", or "what about Spain" , or change language to French and then switchback to English. Also as it uses keyword, I will get not answer if I say “what about the Italian online privacy?” instead of “what about the online privacy in Italy?”

iiraadit

I tried to change the prompts provided to langchain go force it to be grounded / limited by the context, but without success; langchain would answer with chatgpt knowledge or hallucinate instead of saying it doesn’t know

iiraadit

I would like to go back to the same behavior I had with langchain, but making sure it draws answer only from provided context

iiraadit

Would you have any idea of how to do so?
I’m conscious it’s going out of solely LlamaIndex code, but I think it would be a really interesting use case when plugging LlamaIndex and Langchain
When getting it working, I would gladly do a pull request with a corresponding Jupyter Notebook

jjerryjliu0

@iraadit i think i see what you're saying. one thing you could try is to define an extra tool for the langchain agent, and set the description to "use this tool for any question not captured by the other tools" - and have that tool always return "invalid response"

jjerryjliu0

which would force the agent to use that tool instead of hallucinating

rrahoof7857

how to set custom repsone for the indexToolConfig

jjerryjliu0

do you mean a response with its own format? feel free to subclass LlamaIndexTool and add your own stuff, or add a PR to the core repo!

Add a reply

Find answers from the community

Hi