Find answers from the community

Updated 5 months ago

Q&A on particular document as well as all documents

At a glance

Yeah, that's what I was planning on doing—but how significant a performance/monetary cost would that incur?

4 comments

I think the short answer is that it wouldn't be too different compared to just building a single index over all the documents. You can think of it more as "partitioning" the overall index into sub parts.

The more nuanced answer is that it depends on what type of index you'd be using, and how many documents.

I would be able to give more intuition if you give an example of your use-case to illustrate. I'd also recommend just playing around with a small number of documents to get a feel as well 🙂

SSandkoan

For q&a over multiple papers as well as over individual papers.

SSandkoan

Not sure whether it's better to do [GPTListIndex(paper) for paper in papers] and then compose them into into another ListIndex, or instead take a list of VectorIndices and then compose them into ListIndex.

ddisiok

imo there's no hard-and-fast rule here.

I'd recommend starting with GPTSimpleVectorIndex for each paper, and try tuning similarity_top_k to see if it works well for the kind of queries you are making.

You can achieve similar by using GPTListIndex for each paper, and use embedding query mode as well.

Generally speaking, using embedding based queries is cheaper (since it makes less LLM completion calls), but is better for retrieval heavy questions rather than summarization questions

Add a reply