Find answers from the community

Updated 2 years ago

Comparing documents

At a glance
How would one do meta-document comparisons?
eg, user supplies document1 (an initial document) and then supplies document2 (a revision of document 2, heavily reformatted). then allow a query with both documents in context to know changes, retained information, etc?
L
u
9 comments
@uPnP @iraadit you might be interested in the query decomposer.

Basically, it splits the query into two different ones

https://gpt-index.readthedocs.io/en/latest/how_to/query_transformations.html#single-step-query-decomposition
Was looking into langchain agents as well. They do have some overlap, such as working on reAct principles. The way I understand it (so please correct me if Im seeing this wrong),
You could compose relative documents into a single composable graph, and use query transformation to cohere separate document ideas.
In a running example for Tensorflow, these documents could be v1 Tensorflow docs, v2 Tensorflow docs, meta docs (such as changelogs, deprecations, roadmaps). We could combine them into any arbitrary index (List/Tree/Table/etc), and use query transformations to allow meta-document awareness, (such as querying library api migrations)
If we wanted a knowledge graph of Pytorch/SKLearn/(any other ML library) we could also do with the same pattern.
We can then in theory, pool all the graphs into another composable graph, that can pull information across ML library documentations (via query transformation again).
If i understand correctly, the rule of thumb to use an agent would be when you would interface this general ML index graph as a tool for an agent to mix and match with other tools (python repl, serp, llm-math, etc).
This is basically the guiding concept for the chatbot tutorial in the docs.
I would like to confirm however if there are any rule of thumbs of when to use single vs multi step decomposition?
Yea I think you have a good understanding of this! 💪
In my opinion, once multi step queries are out I would use those in most cases.

One disadvantage is that they might slow down response times, but to me it seems like it can act like single step when it needs to. But if you want to limit the process to a single step, then use single step.

(Although personally it's hard to think of actual scenarios where you want this haha)
Yeap. I can similarly imagine using file-based-routing to generate knowledge base graphs, and having multi-step queries be a default configuration. Imagine auto generating the graph from this setup
Attachment
image.png
possibly even a knowledge hub that has its own package manager to simply "llama-index add pytorch[docs,source,books]"
crazy stuff
🧠 hell yea! 💪

Definitely make a PR if you think of a process for this. Something like "GPTFileStructureIndex" or "GPTAutoIndex" lol

I think this is definitely something the llama team is working towards as well 💪
Add a reply
Sign up and join the conversation on Discord