index_name
whenever I am working with a different index. This is required to prevent weaviate from storing all Nodes
under the same index name, which would cause indices to use all documents assigned to other indices, as they were all stored under the same index_name
. This works as expected with Weaviate with little to no issues besides a bit of quirky code. index_name
. However, in the debug logs nor in the MongoDBAtlas collection viewer online can I see any trace of the unique index_name
that I assigned to this vectorstore. Instead, it practically inserts a JSON representation of the Node, with seemingly no reference to the specified index_name
. During query time however, I do see a reference of my specified index_name
in the debug logs, where it is apparently using said index
to create a query pipeline. DEBUG:llama_index.vector_stores.mongodb:Running query pipeline: [{'$search': {'index': 'QApp_2820b774_5218_4e20_b389_0ebdb2fc4765', 'knnBeta': {'vector': [<vector>], 'path': 'embedding', 'k': 2}}}, {'$project': {'score': {'$meta': 'searchScore'}, 'embedding': 0}}]
DEBUG:llama_index.vector_stores.mongodb:Inserting data into MongoDB: [{'id': '8e7c7e88-25d5-4f2e-ba01-de373c0c0516', 'embedding': [<vector>], 'text': <document text>, 'metadata': {<metadata>} etc. etc.
SummaryExtractor
. I want to create "prev"
and "self"
summaries for each node, to make sure that the local context of the Document
is provided to the Node
. However, I do not want the "prev"
summary to be generated at the beginning of a new Document (referring to the first Node
generated from a new Document
), as this summary would refer to the last node from a previous Document (if I understand the functionality correctly), providing irrelevant context. I tried using the include_prev_next_rel
, but that does not seem to resolve my issue. Should I write a custom metadata extractor for this functionality?get_nodes_from_document
from SimpleNodeParser
. How can I check which nodes come from which document? In the source code it looks like all nodes generated from a document are extended into one list. Is there any way to check which nodes came from which Document originally?new_pipeline = IngestionPipeline( transformations=[ SentenceSplitter(chunk_size=25, chunk_overlap=0), TitleExtractor(), ], cache=new_cache, ) # will run instantly due to the cache nodes = pipeline.run(documents=[Document.example()])
nodes = new_pipeline.run(...
instead of pipeline.run(...
index.delete_ref_doc(document_id, delete_from_docstore=True)
, it does not fully remove said document from the docstore. It seems like the docstore/metadata
collection still contains an arbitrary (?) _id
, as well as a doc_hash
property. I checked out the mongo_docstore
, mongodb_kvstore
and the keyval_docstore
files but cannot find out why this behaviour would occur. Any advice?pymongo.errors.OperationFailure: Error connecting to localhost:28000 (127.0.0.1:28000) :: caused by :: Connection refused, full error: {'ok': 0.0, 'errmsg': 'Error connecting to localhost:28000 (127.0.0.1:28000) :: caused by :: Connection refused', 'code': 6, 'codeName': 'HostUnreachable', <timestamps and metadata>}
default_collection
, which I think is the right collection to index (as this one contains the properties: ID, embedding and text)pymongo.errors.OperationFailure: embedding is not indexed as kNN, full error: {'ok': 0.0, 'errmsg': 'embedding is not indexed as kNN', 'code': 8, 'codeName': 'UnknownError' <timestamps and metadata>}
NodeParser
to call get_nodes_from_documents
. Afterwards I used this code to check what my LLM is seeing. from llama_index.schema import MetadataMode document = tax_nodes[12] # Random sample from nodeparser print("The LLM sees this: \n", document.get_content(metadata_mode=MetadataMode.LLM))
The LLM sees this: [Excerpt from document] Chapter: chapter II. Article: Article 12 Paragraph: Paragraph 1 document_title: <lorem ipsum> prev_section_summary: <lorem ipsum> Excerpt: ---- Metadata: ---- Content: <content> ----
[excerpt from document]
clearly shows my metadata. But the actual heading with Metadata:
remains empty. Content
does contain all the text as expected.storage_context
with all 'simple' stores, SimpleDocument, -vector, -index and -graph stores. I then created a VectorStoreIndex.from_documents()
with some sample documents from my SimpleDirectoryReader
and assigned the storage_context
. I was then able to query it as expected and retrieved normal answers. However, I then created another VectorStore
, this time not providing any documents, just an empty array []
and a reference to the StorageContext
(same as used in the 1st vector store). When I want to query the second VectorStore, instead of getting None
as a response, I get a KeyError
on one of the DocIDs of my original VectorStore. In another instance of playing around with it I created an empty VectorStoreIndex, queried it, and all of a sudden I was actually getting results from the documents assigned to the other VectorStoreIndex