Find answers from the community

Updated 3 months ago

Doc id

Has there been a change in the way meta data can be assigned with latest versions? Code below was working before using my doc_id, but now the doc_id in metadata is random in embeddings metadata.

documents = SimpleDirectoryReader(dir).load_data()

for document in documents:
document.metadata = {"user_id": user_id, "bot_id": bot_id, "filename": filename, "doc_id" : doc_id}
print(f"metadata = {document.metadata}")

index = VectorStoreIndex.from_documents(documents, storage_context=vector_storage_context)

index.storage_context.persist()
L
m
7 comments
Nothing has changed here.

That looks correct to me.

Each document will be broken down into nodes, but each node should inherit the metadata

A quick way to check

Plain Text
nodes = index.as_retriever().retrieve("test")
print(nodes[0].metadata)
ok thanks, i will try debugging again...
i get a strange result - bot_id and doc_id are changed even when no logic to do so:

2024-02-25 20:01:28 metadata = {'user_id': '461ec0a3-833b-43ee-84a9-c1850d434e9d', 'bot_id': 44, 'filename': 'amara mou', 'doc_id': '42b82031-d1f9-42dd-9cb9-8c1f82e2c124'} 2024-02-25 20:01:28 2024-02-25 20:01:28,179:INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" 2024-02-25 20:01:28 2024-02-25 20:01:28,455:INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" 2024-02-25 20:01:28 2024-02-25 20:01:28,513:ERROR - /home/helloservicedev/.virtualenvs/helloenv/lib/python3.10/site-packages/vecs/collection.py:502: UserWarning: Query does not have a covering index for cosine_distance. See Collection.create_index 2024-02-25 20:01:28 2024-02-25 20:01:28,513:ERROR - warnings.warn( 2024-02-25 20:01:28 {'user_id': '461ec0a3-833b-43ee-84a9-c1850d434e9d', 'bot_id': '45', 'filename': '1708297428003000.docx', 'doc_id': '4e17d4f1-c69b-4c72-b77a-9123e594d25c'}
filename change can be ignored as its expected
If you are ingesting more than one document, it could retrieve another document
changing the metadata to "did" : doc_id worked... not sure whats going with doc_id but its giving me very strange results on the python output.. even though the data in the vecs.embeddings table is actually correct.
also cant explain why bot_id is incremented +1... adding metadata bot_id: 44 returns bot_id:45 on query with metadata... its possible query is not the right way to test it
Add a reply
Sign up and join the conversation on Discord