Find answers from the community

s
F
Y
a
P
Updated 2 years ago

yourbuddyconner 0001 I m bummed I

@yourbuddyconner I'm bummed I missed the chat with @Clayton yesterday. What I'm really struggling with is just understanding what GPT Index offers compared to just using the GPT3 API. More specifically I would love to just have a better understanding of how the different indexes work.

I'm trying to use GPT Index to index a bunch of construction project docs. Each document contains a mix of:
1 Free form text
2 Tables containing info like cost breakdown
3 Hierarchical lists containing info like the steps involved in each phase of the project
4 Biographies of the personnel leading the project. The bios are laid out like resumes and there can be multiple bios per document.

The documents can be anywhere from 50 to 350 pages.

At the moment, I'm just trying to understand what GTP Index is capable of and how to best use the different indexes. I don't mind manually breaking them the documents into logical chunks, if that will yield better results. We'll improve the ingestion process later.

So... my top level questions are:
1) How best to index just a few of these docs as a proof of concept.
2) Since I'm willing to manually chunk the documents, should I use different Index types for different chunks? Vector Index for text, Table Index for the Biographies.
3) How best to index hierarchical lists?
4) How best to maintain context across chunks
1
y
z
M
19 comments
Yeah sorry about that, I tried to DM you but it didn't go through and it was an impromptu thing
No problem. I got pulled away from my desk.
These are great questions, lets maybe link up tomorrow or friday?
Friday works. I'm free all day.
Sick, will follow up in DMs, sent you a friend request
I'm in PST timezone.
Awesome, me too, accept my FR and will send a scheduling link
I was looking for more visibility into the inner workings of the library. What is the query that is sent? what metadata of the original documents gets saved (would like to have the original doc name included)? Some doc about the JSON structure of the index would also help to reuse the index in other applications.. These are the questions I have after my first session with the tool πŸ˜„ It does work great though, like magic..
I'm meeting with @yourbuddyconner tomorrow at 11:30 am PST. Do you want to join
It's a tough timezone for me. Let me know what was discussed πŸ˜„
Wow! Interesting and very relevant. Thaks.
Great connecting @zgott

We discussed:
  • State of the union
  • Getting a simple Q/A PoC working
  • Introducing structure to semi-structured documents by converting to Markdown
  • Problem space around advancing from a PoC to something more production-ready
  • Use-case of elasticsearch and other search engines for document and chunk retrieval
  • How excited we are that the gpt_index community is bursting at the seams right now
  • "AI expansion syndrome" results in a lot of manic episodes and sleepless nights (in a good way, lol)
Will be throwing together a thread on twitter surrounding this, specifically:
  • tinkering with gpt_index for fun and profit
  • Selecting the right gpt_index data structure for your PoC
  • How to navigate the docs and the various classes and abstractions
Thanks again for your time today.
You would be doing an immense favor to Humanity if you could provide a step by step video or illustrated document showing exactly what stuff looks like and where stuff is.
I read over and over his tweets about converting the PDFs and this is an excellent example of how people who already know what they are doing leave out crucial stuff that the average Noob cannot figure out because they have no frame of reference - because they may have never even tried to do something like this before. I suspect that his method will not work for the pdfs which I have to work with. But I will never know.
Always start with state of the union πŸ˜„
Agree, this library does a lot of amazing things but it's hard to grasp how / what are the details.
Add a reply
Sign up and join the conversation on Discord