i guess it's something to due with your indexing structure
wich then depends on the usage you want to do
after, ask logan, but idk if the technique is different with multiples documents
I assume that's the case too, but I'm just using the GPTVectorStoreIndex
, which from those articles should maintain articles as nodes. It would seem there must be an issue with the top_k
that is returned not having the relevant info in it.
Those articles are pretty rudimentary and don't explain much beyond doc->node and general aggregation of docs by similarity score.
Logan? Can you tag them here?
try maybe a keyword index ? that would not work for other prompts but if you need smth specific
i guess so, i dont do it cuz they should have soooo much pings
he's one of the guys working on the project and he's helping us soooooo much
I was thinking I was just missing something silly. It sounds like that might not be the case.
These PDFs have tables with this info in them if that elucidates something
I'm kind of a beginner here too so I might not know the solution
that's what i have understood from now on
i'm gigling with a lot of code trying to understand how everything works. Since there isnt a lot doc or exemple on internet of what i'm trying to create
@Orion Pax dates are a little tricky for embedding retrieval. Are you using default options right now? You could try increasing the top k
index.as_query_engine(similarity_top_k=3)
Usually separating these documents into "groups" helps to. A group could be a single document, or a collection of documents on a topic. Then you can create an index for each group and use a router query engine or a graph to send your query to the correct documents.
Had to increase it to 4 and that worked
Although, it gave 11% instead of 10
Which is interesting. I wonder if one of the docs rounded up
Do you have an example or explanation of keyword index tools and how to incorporate them?
I'm definitely interested in that idea because of how riddled these SEC files are with financial terms. I suspect it'll make the queries MUCH better
Yea you can use a keyword index just like any other index. Even with your base setup right now you could try swapping the vector index for a keyword index
There are two keyword indexs, one that uses basically simple string parsing to find keywords (fast, but sometimes doesn't find keywords), and a smarter version that asks the LLM to indentify keywords (slower + token usage when building the index, maybe better search results)
GPTKeywordTableIndex
is the smarter one, GPTSimpleKeywordTableIndex
is the faster one