@yourbuddyconner I'm bummed I missed the chat with @Clayton yesterday. What I'm really struggling with is just understanding what GPT Index offers compared to just using the GPT3 API. More specifically I would love to just have a better understanding of how the different indexes work.
I'm trying to use GPT Index to index a bunch of construction project docs. Each document contains a mix of: 1 Free form text 2 Tables containing info like cost breakdown 3 Hierarchical lists containing info like the steps involved in each phase of the project 4 Biographies of the personnel leading the project. The bios are laid out like resumes and there can be multiple bios per document.
The documents can be anywhere from 50 to 350 pages.
At the moment, I'm just trying to understand what GTP Index is capable of and how to best use the different indexes. I don't mind manually breaking them the documents into logical chunks, if that will yield better results. We'll improve the ingestion process later.
So... my top level questions are: 1) How best to index just a few of these docs as a proof of concept. 2) Since I'm willing to manually chunk the documents, should I use different Index types for different chunks? Vector Index for text, Table Index for the Biographies. 3) How best to index hierarchical lists? 4) How best to maintain context across chunks
I was looking for more visibility into the inner workings of the library. What is the query that is sent? what metadata of the original documents gets saved (would like to have the original doc name included)? Some doc about the JSON structure of the index would also help to reuse the index in other applications.. These are the questions I have after my first session with the tool π It does work great though, like magic..
You would be doing an immense favor to Humanity if you could provide a step by step video or illustrated document showing exactly what stuff looks like and where stuff is.
I read over and over his tweets about converting the PDFs and this is an excellent example of how people who already know what they are doing leave out crucial stuff that the average Noob cannot figure out because they have no frame of reference - because they may have never even tried to do something like this before. I suspect that his method will not work for the pdfs which I have to work with. But I will never know.