Find answers from the community

Updated 3 months ago

Hi while working with llama index 0 6 35

Hi while working with llama-index==0.6.35
faced the following issues
with the meta data
Plain Text
metadata = {
                "provider": provider,
                "admin_id": admin_id,
                "chunk_size": int(self.chunk_size),
                "chunk_overlap": int(self.chunk_overlap),
                "num_indexes": int(num_indexes),
                "category": tag,
                "page_label": page_no,
                "provider":<provider>,
                "document_name":document_name,
                "organisation_name":organisation_name,
                "uploaded_at":get_current_date()
            }
Document(
                    text=clean_text(text),
                    doc_id=f"{document_id}",
                    metadata=metadata,
                    excluded_llm_metadata_keys=[
                        "category",
                        "page_label",
                        "num_indexes",
                        "chunk_overlap",
                        "chunk_size",
                        "admin_id",
                    ],
                    excluded_embed_metadata_keys=[
                        "category",
                        "page_label",
                        "num_indexes",
                        "chunk_overlap",
                        "chunk_size",
                        "admin_id",
                    ],
                    metadata_seperator=" | ",
                    metadata_template="{key} = {value}",
                    text_template="Metadata: {metadata_str}\n=====\nContent: {content}",
                )

1st while generating text for llm using above we get Metadata: : Content, twice in the text
2nd the embedding text too had the same problem, even the evaluation prompt too
3rd was not able to access doc_id from old nodes
4th the quality of answer too went down with the new metadata version, though the new prompt made more sense (using gpt4) πŸ€”
L
S
6 comments
  1. I'm unable to replicate this πŸ€” document.get_content(metadata_mode="llm") or document.get_content(metadata_mode="embed")
  2. Same above
  3. When you say old nodes, do you mean old nodes from a local index? Or an old pinecone index?
  4. I think part of the answer quality has to do with chunking. You have lot of metadata, so this is probably affecting you
https://github.com/jerryjliu/llama_index/pull/6744
For 1 and 2, this was my test code

Plain Text
from llama_index.schema import Document, MetadataMode

metadata = {
    "provider": "val",
    "admin_id": "val",
    "chunk_size": "val",
    "chunk_overlap": "val",
    "num_indexes": "val",
    "category": "val",
    "page_label": "val",
    "provider": "val",
    "document_name": "val",
    "organisation_name": "val",
    "uploaded_at": "val"
}

doc = Document(
    text="text",
    doc_id="123",
    metadata=metadata,
    excluded_llm_metadata_keys=[
        "category",
        "page_label",
        "num_indexes",
        "chunk_overlap",
        "chunk_size",
        "admin_id",
    ],
    excluded_embed_metadata_keys=[
        "category",
        "page_label",
        "num_indexes",
        "chunk_overlap",
        "chunk_size",
        "admin_id",
    ],
    metadata_seperator=" | ",
    metadata_template="{key} = {value}",
    text_template="Metadata: {metadata_str}\n=====\nContent: {content}",
)

print(doc.get_content(metadata_mode=MetadataMode.LLM))
print(doc.get_content(metadata_mode=MetadataMode.EMBED))
print(doc.get_content(metadata_mode=MetadataMode.ALL))
print(doc.get_content(metadata_mode=MetadataMode.NONE))
The two metadata modes (embed and llm) are whats used under the hood before giving text to either respective model
one more thing new nodes (llama-index==0.6.35) are not compatible with old version of llama_index.
we get the below error
Plain Text
File "/app/src/chatbot/query_gpt.py", line 266, in get_slack_flag
   custom_index.query(final_query)
 File "/usr/local/lib/python3.10/site-packages/llama_index/indices/query/base.py", line 23, in query
   response = self._query(str_or_query_bundle)
 File "/usr/local/lib/python3.10/site-packages/llama_index/query_engine/retriever_query_engine.py", line 142, in _query
   nodes = self._retriever.retrieve(query_bundle)
 File "/usr/local/lib/python3.10/site-packages/llama_index/indices/base_retriever.py", line 21, in retrieve
   return self._retrieve(str_or_query_bundle)
 File "/app/src/chatbot/query_gpt.py", line 109, in _retrieve
   all_query_nodes = self._all_query_retriever.retrieve(self._all_query)
 File "/usr/local/lib/python3.10/site-packages/llama_index/indices/base_retriever.py", line 21, in retrieve
   return self._retrieve(str_or_query_bundle)
 File "/usr/local/lib/python3.10/site-packages/llama_index/token_counter/token_counter.py", line 78, in wrapped_llm_predict
   f_return_val = f(_self, *args, **kwargs)
 File "/usr/local/lib/python3.10/site-packages/llama_index/indices/vector_store/retrievers/retriever.py", line 84, in _retrieve
   query_result = self._vector_store.query(query, **self._kwargs)
 File "/usr/local/lib/python3.10/site-packages/llama_index/vector_stores/pinecone.py", line 304, in query
   text = match.metadata[self._text_key]
KeyError: 'text'
I could have sworn I specifically tested old -> new with pinecone. Can check again in the morning
@Siddhant Saurabh metadata should hopefully be fixed, small bug with legacy vector store data https://github.com/jerryjliu/llama_index/pull/6867
Add a reply
Sign up and join the conversation on Discord