Find answers from the community

Updated 3 months ago

For Notion the problem was the Reader

For Notion, the problem was the Reader would pull in the "body" of a page (what the user has written on the page, relevant info etc) but none of the metadata (title of the page, when it was created etc) wasn't included.

I looped through the Documents once they were created and hit a Notion API which got the metadata (but couldn't retrieve the body for some reason) and appended it to the Document as a dict.

That way when a query comes in the metadata can be used to create a vector similarity which allowed for retrieval to be more accurate (and more context for the language model to work off of when generating a response). As detailed in the docs, a language model could even create the metadata itself to allow for better querying.
b
1 comment
I created some code like below as a way to pass metadata. For now it's a wrapper function for myself. I'm only passing the title as metadata, but can add to the dictionary if needed.

Plain Text
    def createSimpleVectorIndex(self, directory: str = 'sample'):
        # Read documents from disk
        def filename_to_metadata(filename: str) -> Dict[str, Any]:
            return {"episode-title": filename}

        documents = SimpleDirectoryReader(directory, file_metadata=filename_to_metadata).load_data()

        # Create index
        index = GPTSimpleVectorIndex(documents, include_extra_info=True)

        # Save index to disk
        index.save_to_disk(f'index/vector-index-{directory}.json')
Add a reply
Sign up and join the conversation on Discord