Find answers from the community

Updated last year

Hi, I'm looking to label nodes with

Hi, I'm looking to label nodes with keywords out of a specific set (up to 5). The keywords indicate whether a particular node contains a piece of information, like so:

  • contains_x
  • contains_y
Currently I'm doing with a CustomExtractor :

Plain Text
class CustomExtractor(MetadataFeatureExtractor):
    def class_name():
        return "CustomExtractor"

    def extract(self, nodes):
        metadata_list = [
            {
                "custom": (
                    node.metadata["contains_x"]
                    + "\n"
                    + node.metadata["contains_y"]
                    + "\n"
                    + node.metadata["contains_z"]
                )
            }
            for node in nodes
        ]
        return metadata_list


This yields good results, but I don't know how to combine this with querying. I'd like to convert (x,y,z) into keywords, and then filter out all nodes that don't match. So that's either;

  • using a KeywordTableIndex, however I can't supply a custom list of keywords, I'd probably need to write some totally custom index implementation
  • using a VectorIndex and filtering out by the metadata
What would be the best way forward to achieve this? Thanks a lot for your help
T
r
Ł
5 comments
You can do metadata filtering with the VectorStoreIndex, have you looked at that yet? https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/index/vector_store_guide.html
@Łukasz you can use keywordnodepostprocessor to filter out nodes which contain specific keywords - https://docs.llamaindex.ai/en/stable/core_modules/query_modules/node_postprocessors/modules.html#keywordnodepostprocessor
@ravitheja This seems an even better solution, thanks a lot
Damn, node postprocessors won't work since they're applied after a fetch from an index. So we still use top-k similarly and then filter, gotta go with a vector store filtering
Add a reply
Sign up and join the conversation on Discord