Hi, thanks for creating this @jerryjliu0, amazing project.
One question I have, is I'm having trouble with getting proper answers for more specific questions. The data I'm indexing is a chat thread and in it good restaurants in Barcelona are discussed. However when I query 'What are good restaurants in Barcelona?' I don't get great results.
I'm using a GPTSimpleVectorIndex and chunk the data into chunks of 1000 tokens.
What are some techniques to solve for this? One way I thought of was to use a Keyword based index and query both and compare results somehow and return the better one. Thoughts?
I think, by design, gpt-index will not summarize over your documents. It will just find the matching one(s) for context, but of course there is a limit to this number.
So you'd need to do some extra design steps to have summaries created. Can anyone confirm?
@kaisen you mentioned each chunk has 1000 tokens. Have you tried increasing similarity_top_k in the index.query call? I would play around with slightly smaller chunks (256-512), and higher similarity_top_k (5-10)