Find answers from the community

Updated 2 months ago

Thoughts

Based on your seemingly endless wealth of knowledge, can you let me know your thoughts on this:

  1. Relatively large store of PDF/TXT documents (for this example 200)
  2. Need to retain detail as much as possible in the answer.
  3. 50/50 split on if answers will be based on an individual document or need to be synthesized across multiple documents.
With the above considered, what is the best approach (in your opinion) available today to achieve this? I am currently using a graph to collate multiple simple vector indexes and then querying that. Still very early into my exploration of this framework so keen to get your thoughts.
L
E
4 comments
Can the documents grouped in any way before hand, or they could be about anything?
@Logan M They certainly can be. Here is a similar example that may be relatable but not exactly my use case.

Pretend we are indexing legislation & precedents. The legislation would be querying only 1 document at a time for an answer specifically relating to that law/act. Eg. How large can my shed be before I need council consent?

The second set would be precedents which if queried, would need to check multiple documents (can be further refined into sectors: eg. property, civil, etc. ) and then return an answer that could relate to multiple sources.
I think your onto the right track with the graph approach! You can set up some pretty complex graphs that would allow this

Have you seen the latest guide on graphs? It covers some pretty complex stuff as well, you might find it useful! https://gpt-index.readthedocs.io/en/latest/guides/tutorials/graph.html
@Logan M I appreciate your input! I did have a brief look over that yesterday but aiming to give it a bit more time tomorrow. Overall just wanted to make sure I haven't made a glaring error in logic from the start and then chasing down the incorrect rabbit hole.
Add a reply
Sign up and join the conversation on Discord