Find answers from the community

Updated 3 months ago

hey from 0 6 2 llama index we are seeing

hey from 0.6.2 llama_index, we are seeing that Document extra parameter is also getting added to text while adding to pinecone vector store?
how can we remove them? (this is polluting/disturbing the search with embedding while using pinecone)
cc:
1
S
L
d
15 comments
Document don't have extra parameters in the text field but when they go under
Plain Text
vector_store = PineconeVectorStore(
            pinecone_index=self.pinecone_index,
            metadata_filters=metadata_filters,
            namespace="org-" + payload["organisation"],
        )
storage_context = StorageContext.from_defaults(vector_store=vector_store)
pc_index = GPTVectorStoreIndex.from_documents(
                documents,
                pinecone_index=self.pinecone_index,
                storage_context=storage_context,
                service_context=service_context,
            )

the text in pinecone have extra parameters.
gone through the code but still not able to figure out
please help
cc: @Logan M @ravitheja
I don't think there's a way to ignore it at the moment. But @disiok has been working on metadata changes
facing same above issue with 0.6.5 version
@Logan M do you know what we are injecting automatically?
@Siddhant Saurabh do you have an example of the extra info you are seeing?
oh I see, I think what happens is:
  1. our reader automatically injects extra_info to document
  2. the node parser propagates that to the Nodes
  3. when we do Node.get_text() it gets injected automatically
Yea that sounds like what I was thinking of πŸ‘€
There's a include_extra_info flag in the SimpleNodeParser which we can toggle as a dumb fix
for this particular case
but ya we need to be more explicit about what is being injected where, and give user control over it
ah okay so we can just switch this off for now
hey @disiok and @Logan M
when we use SimpleNodeParser(include_extra_info=False)
then we don't have "google\ntimestamp: 1684345677.945844\nchunk_size: 600\nchunk_overlap: 150\nnum_indexes: 74" in the text of the node
but then extra_info become {} empty

how can we not include "google\ntimestamp: 1684345677.945844\nchunk_size: 600\nchunk_overlap: 150\nnum_indexes: 74" not in the text of the node, but only keep then in extra_info of the node?
its solved in 0.6.8 thanks
Add a reply
Sign up and join the conversation on Discord