I am just starting my journey so trying to understand this. My goal here is to feed gpt compliance frameworks, then specific policies for each company that that can query against. After that real-time integration into customer data so customer can ask what does the framework say around patching, next what do our policies say and after that ask how many of our systems are compliant.
I have similar objectives. I want to load in some contracts and ask questions, and compare against an idea. I will have a play later this week. My first approach will be to load a contract to a single index, ask some questions and check the answers. I'm not sure if I need to split the contract into clauses.
Worth noting that I have done similar with GPT3, if the size is small.
Worth fiddling with chunk size which is quite large by default, especially if the answer requires a few bits from different parts of the documentation to answer. I generally find fishing questions e.g. getting the answer from a doc is more accurate than global questions like “what is this document about” unless the document has a summary.