Find answers from the community

Updated 3 months ago

Presumably a common scenario

Presumably a common scenario:
I have a collection of documents - be it pdfs, work emails, etc, and want to make a QA system based on this data. What's the current level of capabilities in that regard?
Can LLM + gptindex answer questions that can't be done with top-10 similarity search?
Stuff like "summarize recent results of all papers on attention in transformers", or "which colleagues asked me for help more frequently?"

6 comments

MMikko

I think "all papers" is still a challenge, or at least it takes some 💸 to run all data through an llm

AAlexander Plavin

What about more concrete questions that still may require referencing many tens of document extracts?

Like "For each paper that describes training on GPUs, list the GPU number and training time".

I tried existing apps that use top-k embeddings-based retrieval, and they don't do well for such questions.

AAndreaSel93

For these kind of questions you need to pass each node in a llm. The List index could fit well but i think its very expensive

AAlexander Plavin

I imagine an LLM-interactive process like:

"LLM, suggest some keywords that should occur in paragraph answering this question: ..."
simple keyword search for these keywords, return top-10 matches
"LLM, here's the question ... and context ... . Suggest more keywords to look up, or say that you have enough context to answer."
if enough, answer the question, otherwise loop again

Is there something along these lines already?

MMikko

It's a bit of a challenge to know what is enough, it may be top 10 for some queries and 3 for some. Only way to be sure is to run all context through the LLM with something like a ListIndex.

MMikko

But "suggest more keywords to look up" can be done and sounds like an interesting strategy.

Add a reply