Thoughts

At a glance

The community member has a large collection of PDF/TXT documents (around 200) and needs to find the best approach to retain as much detail as possible in the answers, which may be based on either individual documents or synthesized across multiple documents. The community member is currently using a graph to collate multiple simple vector indexes and then querying that, but is still in the early stages of exploring this framework. Other community members suggest grouping the documents beforehand, such as indexing legislation and precedents separately, and provide a link to a guide on using graphs for this type of task.

Useful resources

EEnderEnder

Based on your seemingly endless wealth of knowledge, can you let me know your thoughts on this:

Relatively large store of PDF/TXT documents (for this example 200)
Need to retain detail as much as possible in the answer.
50/50 split on if answers will be based on an individual document or need to be synthesized across multiple documents.

With the above considered, what is the best approach (in your opinion) available today to achieve this? I am currently using a graph to collate multiple simple vector indexes and then querying that. Still very early into my exploration of this framework so keen to get your thoughts.

4 comments

LLogan M

Can the documents grouped in any way before hand, or they could be about anything?

EEnderEnder

@Logan M They certainly can be. Here is a similar example that may be relatable but not exactly my use case.

Pretend we are indexing legislation & precedents. The legislation would be querying only 1 document at a time for an answer specifically relating to that law/act. Eg. How large can my shed be before I need council consent?

The second set would be precedents which if queried, would need to check multiple documents (can be further refined into sectors: eg. property, civil, etc. ) and then return an answer that could relate to multiple sources.

LLogan M

I think your onto the right track with the graph approach! You can set up some pretty complex graphs that would allow this

Have you seen the latest guide on graphs? It covers some pretty complex stuff as well, you might find it useful! https://gpt-index.readthedocs.io/en/latest/guides/tutorials/graph.html

EEnderEnder

@Logan M I appreciate your input! I did have a brief look over that yesterday but aiming to give it a bit more time tomorrow. Overall just wanted to make sure I haven't made a glaring error in logic from the start and then chasing down the incorrect rabbit hole.

Add a reply

Find answers from the community

Thoughts