Consider I have this kind of composite

Consider I have this kind of composite of indices, is it possible to give priority to certain docs or using only one docs?

Attachment

Screen_Shot_2023-04-09_at_12.21.11_AM.png

31 comments

Can I use index_structure_id to only use certain doc?

I believe index structure id is the id used for the index structure of your file(s). It’s not limited to just one file.

One possible workaround would be to use response.source_nodes and then filter by a specific doc id.

Another approach would be to index each file individually first. And then combine all the files into a composavel index. When you want to force individual doc search, you can hone in on the index structure for that file.

With that said, this os stil not very intelligent and doesn’t allow you to prioritize multiple files.

Something for future release @Logan M @jerryjliu0

If I made an index for a document, then I think the index_structure_id refers to the index and document also.

For this, you mean if I have 3 docs, then I will have 4 indices, [index_doc1, index_doc2, index_doc3, composite_index]. And use each indices for specific cases?

Exactly!

It’s still not possible to prioritize but at least you can focus on 1 at a time. For multiple docs you would use the composite index.

Is it possible to give some kind of keyword to a node(index) and then use those keyword when querying?

Absolutely.

You can either use keyword index for this and then get the LLM to find those keywords and assign them to each node.

You could also use the new hybrid search that combined vector and keyword search and allows you to prioritize vector/keyword over each other by modifying the alpha value.

I can specify the keywords but during querying, is it possible to refer to only the node(index) that have those keywords (when I have several indices in composite index)?

How can I assign a keyword to the node by the way?

And hybrid search is only possible when using KGindex, right?

I believe this should be possible but I have no idea how. Will let you know if I figure it out. In the meantime ask that question again and tag Logan and jerry.

Hybrid search is possible with pinecone and weaviate.

Thanks, I'll try to find it tooo.

Let me know if you do find it 🙂

https://weaviate.io/blog/hybrid-search-explained

You can find the info you needed in the BM25F section.

cc @sunwoong @BioHacker yeah there's a few strategies here. One is to define a keyword parent index in a composed graph. this will find keywords in the query and match them to keywords in the source text - and you can then search over those documents in subindexes

thanks for the answer! can you explain me a little bit more?

we do have some initial keyword filtering support, but that's on the node text itself - we don't have a separate "metadata" field for keywords atm. could be interesting to add

Yeah like from this doc: https://gpt-index.readthedocs.io/en/latest/reference/composability.html - if you define an index over each document, you can define a parent index that can "route" a query to the relevant documents. one way of doing this is using the keyword table. another is using the vector index on the subindexes

Thanks for the answer! But my problem is a little more difficult.

For example, I want to proofread a review about a restaurant. And I have 2 documents: Review about restaurant, Factual data about restaurant (such as the Years in business, start date, revenue, president, etc.)

In this case, sometimes I have to proofread based on other review document, or sometimes factual data. I want to differentiate this process while querying.

Is it still possible with composition of indices?

@sunwoong do you have criteria for deciding when to proofread based on another review doc, or based on factual data? if you compose a tree index or a vector index on top of two indexes for instance (your review doc corpus and factual data corpus), then the graph can dynamically choose to "route" a query to the right review doc and/or factual data

First we don't have a certain criteria. I'll give you an example about my problem,

Consider that I want to proofread "an introduction of a restaurant"

{query_str}: The restaurant is open through 7 to 12, we have some delicious food.
document1: The factual data about the restaurant, such as, opening time.
document2: The tips for how to write a effective introduction
desiring answer: The restaurant is open through 9 to 12, we have some delicious British seafood dishes.
- here, the first opening time should be fixed through document1, and food should be replaced based on document2.
So I want to first fix the data based on the document1, and then fix the data based on document2, maybe sequentially...? (I think in this case, maybe I can use ListIndex)

"then the graph can dynamically choose to "route" a query to the right review doc and/or factual data", from here, I want to know the logic of "routing" too. does it based on the some similarity metric or something, maybe related to response synthesis?

From my answer #1, is it possible to apply the criteria if we have one?