Find answers from the community

Updated 2 months ago

Presumably a common scenario

Presumably a common scenario:
I have a collection of documents - be it pdfs, work emails, etc, and want to make a QA system based on this data. What's the current level of capabilities in that regard?
Can LLM + gptindex answer questions that can't be done with top-10 similarity search?
Stuff like "summarize recent results of all papers on attention in transformers", or "which colleagues asked me for help more frequently?"
M
A
A
6 comments
I think "all papers" is still a challenge, or at least it takes some ๐Ÿ’ธ to run all data through an llm
What about more concrete questions that still may require referencing many tens of document extracts?

Like "For each paper that describes training on GPUs, list the GPU number and training time".

I tried existing apps that use top-k embeddings-based retrieval, and they don't do well for such questions.
For these kind of questions you need to pass each node in a llm. The List index could fit well but i think its very expensive
I imagine an LLM-interactive process like:

  • "LLM, suggest some keywords that should occur in paragraph answering this question: ..."
  • simple keyword search for these keywords, return top-10 matches
  • "LLM, here's the question ... and context ... . Suggest more keywords to look up, or say that you have enough context to answer."
  • if enough, answer the question, otherwise loop again
Is there something along these lines already?
It's a bit of a challenge to know what is enough, it may be top 10 for some queries and 3 for some. Only way to be sure is to run all context through the LLM with something like a ListIndex.
But "suggest more keywords to look up" can be done and sounds like an interesting strategy.
Add a reply
Sign up and join the conversation on Discord