Comparing documents

At a glance

The post asks how to perform meta-document comparisons, where a user provides two documents (an initial document and a revised version) and wants to query the changes, retained information, etc. The comments discuss using tools like the query decomposer and LangChain agents to handle this use case. Community members suggest composing relative documents into a single graph, using query transformations to enable meta-document awareness, and potentially creating a knowledge hub with a package manager to manage different ML library documentations. There is no explicitly marked answer, but the community members provide insights and ideas on how to approach the problem.

Useful resources

uuPnP

How would one do meta-document comparisons?
eg, user supplies document1 (an initial document) and then supplies document2 (a revision of document 2, heavily reformatted). then allow a query with both documents in context to know changes, retained information, etc?

9 comments

LLogan M

@uPnP @iraadit you might be interested in the query decomposer.

Basically, it splits the query into two different ones

https://gpt-index.readthedocs.io/en/latest/how_to/query_transformations.html#single-step-query-decomposition

uuPnP

Was looking into langchain agents as well. They do have some overlap, such as working on reAct principles. The way I understand it (so please correct me if Im seeing this wrong),
You could compose relative documents into a single composable graph, and use query transformation to cohere separate document ideas.
In a running example for Tensorflow, these documents could be v1 Tensorflow docs, v2 Tensorflow docs, meta docs (such as changelogs, deprecations, roadmaps). We could combine them into any arbitrary index (List/Tree/Table/etc), and use query transformations to allow meta-document awareness, (such as querying library api migrations)
If we wanted a knowledge graph of Pytorch/SKLearn/(any other ML library) we could also do with the same pattern.
We can then in theory, pool all the graphs into another composable graph, that can pull information across ML library documentations (via query transformation again).
If i understand correctly, the rule of thumb to use an agent would be when you would interface this general ML index graph as a tool for an agent to mix and match with other tools (python repl, serp, llm-math, etc).

uuPnP

This is basically the guiding concept for the chatbot tutorial in the docs.
I would like to confirm however if there are any rule of thumbs of when to use single vs multi step decomposition?

LLogan M

Yea I think you have a good understanding of this! 💪

LLogan M

In my opinion, once multi step queries are out I would use those in most cases.

One disadvantage is that they might slow down response times, but to me it seems like it can act like single step when it needs to. But if you want to limit the process to a single step, then use single step.

(Although personally it's hard to think of actual scenarios where you want this haha)

uuPnP

Yeap. I can similarly imagine using file-based-routing to generate knowledge base graphs, and having multi-step queries be a default configuration. Imagine auto generating the graph from this setup

Attachment

uuPnP

possibly even a knowledge hub that has its own package manager to simply "llama-index add pytorch[docs,source,books]"

uuPnP

crazy stuff

LLogan M

🧠 hell yea! 💪

Definitely make a PR if you think of a process for this. Something like "GPTFileStructureIndex" or "GPTAutoIndex" lol

I think this is definitely something the llama team is working towards as well 💪

Add a reply

Find answers from the community

Comparing documents