Initial vs refined response

At a glance

Hey there, I can see when I do index.query that there is an 'initial reponse' and a 'refined response' - the initial response is actually what I need, how can I parse that out instead of the refined? And what are the levers that defines what 'refined' is and how can I optimize that? Thanks!

6 comments

LLogan M

What kind of index are you using?

For List and Vector indexes, llama index will find X number of nodes.

If the text from those nodes doesn't fit in the token limit, it sends an initial query using to the LLM using enough context to fill the prompt, and then the answer is refined 1+ times using the contexr text that didnt fit in the first prompt

Basically, this ensures the LLM eventually sees the text that answers the query

For a vector index, you can use the similarity_top_k option in the query to control how many nodes you look at

For the list index, by default it will look at every node in the index (usually this is best for summarizing)

TTenzin | Tali AI

Thanks for the reply! I’m using SimpleVectorIndex.
For context, the use case is we have a table of question / answer columns and in the event that specific q is asked I want that response (but also allow room for generalized answer should it not be an exact question).

Each ‘answer’ is well under the 4000 token limit for Davinci.

Would limiting the k value do it here?

New to Llama so appreciate all the help

LLogan M

Interesting!

By default, the vector index has the top_k set to 1

I'm not sure how you are creating the index, but it would make sense to me if each "document" was a q/a pair or something similar.

Then, the vector index could fine the best matching qa based on embeddings and generate a response 🤔

TTenzin | Tali AI

Interesting, is there any sample code on how to make each document a q/a pairing? Is the default behavior to fit as much into one document as tokens will allow?

If that is the default behavior it makes sense as right now it's adding additional context that doesn't need to be there.

Btw this specific work is part of a grant for the Ethereum foundation so once we get things up and running it would be cool to add this to the Llama use cases / projects

LLogan M

Yea, by default llama index just crams as much as it can into a single node when there is only one input document.

We can change that by creating more documents though! My strategy would be something like this, where text1, text2 etc. come from some manually splitting of the QA pairs:

Plain Text

from llama_index import Document, GPTSimpleVectorIndex

<...manually split qa pairs into text1, text2...>

text_list = [text1, text2, ...]
documents = [Document(t) for t in text_list]

index = GPTSimpleVectorIndex(documents)

Another option would be splitting your table beforehand into multiple text files (one for each qa pair) and then using the SimpleDirectoryReader, but that might create too many files

TTenzin | Tali AI

Cool! Thank you, I’ll try that tomorrow

Add a reply

Find answers from the community

Initial vs refined response