hey guys, my mvp deals with cases (in law) before a human rights board.
I have two cases in my 'cases/' directory and the first is 12 pages with the second being 7 pages long. I notice that the simple loader here:
reader = SimpleDirectoryReader(
input_dir="cases/"
)
documents = reader.load_data()
is loading the pdfs into a list of Document objects, but the thing is--it's loading 1 page as a single Document object. I turn each Document object into a node and put all 19 nodes into my vector store.
Unfortunately, gpt-4 is mixing facts from each case and giving wrong answers.
I think I'll get better results if each case was its own Document object, and subsequently it's own Node. Does one of the default loaders have the ability to load an entire pdf as one Document object?
I swear I watched a tutorial on this but I've been looking for it and can't find it for the life of me. Send halp please π β€οΈ