Find answers from the community

Updated last year

Logan M i have this use case for

@Logan M i have this use case for extracting differences between 400 page legal contracts. Client asks for the most similar contract to one they specify, and the goal is to find differences at a clause level since the contracts have broadly similar structure in sections. To first fetch the most similar documents, we would have to construct some structure around the chunked documents and establish a similarity comparison method - as we cant just compare chunked documents, we would have to compare sets of chunked documents between 2 contract files, arranged in some structure hierarchically i suppose? Not sure how to do that comparison across large documents in one go. Have you encountered this before and do you have any ideas from the documentation that you suggest I can explore straight away?
1
b
L
V
13 comments
Commenting to follow along, interesting problem.
Yea certainly an interesting problem!

My gut reaction, you would need some kind of normalization step -- get the contract into some kind of expected structure, so that comparisons can be done piece by piece
https://docs.llamaindex.ai/en/stable/examples/query_engine/pydantic_query_engine.html


We recently added pydnatic outputs to query engines (thanks @bmax ❀️ )

So if you can think of some structure to normalize a contract to, this could work quite well. It could fill out the structure as it iterates over the contract
@jerryjliu0 if you do have some insights, would appreciate that too
Thanks a lot!

Right, I guess the main challenges come down to -
  1. Getting contracts broken down into chunks that fit into a structure
  2. Comparison of entire contracts with structure constructed in 1 for similarity
  3. Having the structure decomposable into granular
I guess the task boils down to 2 levels of similarity comparisons:
  1. Document Level:
    • Using the query, we filter to findthe corresponding contract documentthe user is referring to (metadata filters here?)
    • Once we find the user's candidate contract, we filter other "contract" documents that are similar to the candidate contract, and choose top 1
  1. Items within 2 contracts
    • Find similar sections, clauses and outline how the 2 contracts are different across clauses and sections
Just trying to wrap my head around how I should load these 400 page documents to enable these similarity comparisons on both levels
To then of course use RAG for querying similarities on Level 2
@here do you guys think knowledge graphs might be useful to explore here?
I think not quite knowledge graphs, but there might be something the node relationships to exploit here πŸ€”

Just a quick refresh, nodes allow you to set parent, children, next, and prev relationships
Seems like this is something where I would take 2-5 contracts and build a POC from there πŸ˜… No need to worry about 400+ yet
Well one contract is 400 pages long, thats really where this problem starts
But yeah I get the spirit of the message - I'll try hacking and see where I get stuck
@Logan M hello as suggested by you i have created a fastapi for my query engine but it fails to handle multiple requests at same time ,get weird response , plz help
Add a reply
Sign up and join the conversation on Discord