So, the default top k is 2. What this means is that after your document is chunked and embedded (default chunk size is 1024), only the top 2 chunks are returned for a query
So a few options. You can increase the top k (
index.as_query_engine(similarity_top_k=5)
)
You could use a keyword index to ensure all the relevant chunks are always retrieved
Although most promising I think is a newer feature. Some of your questions seem very sql oriented. You could create a small database (maybe using sqlite) and use our new feature that combines text2sql and semantic search
Notebook:
https://gpt-index.readthedocs.io/en/latest/examples/query_engine/SQLAutoVectorQueryEngine.htmlYoutube:
https://youtu.be/ZIvcVJGtCrY