Find answers from the community

Home
Members
OverclockedClock
O
OverclockedClock
Offline, last seen 2 months ago
Joined September 25, 2024
Hey all we are using llama-index version 0.9.5 for an older project and we were wondering if there's a functionality in this version to query two different vectorstoreindices, and ensure that nodes from both are used to synthesize an answer. I can't find the queryfusionretriever in this verison of llama-index, and the closest I can find is building a custom retriever. Any ideas / approaches to tackle this issue?
5 comments
L
O
Hey All, I've been working with the NebulaGraphStore, and for some reason my code seems to append a rogue comma (,) to my edge parameter?

I have a full on default NebulaGraph setup, in which I've followed the llamaindex docs. I have the following atm
8 comments
L
O
Might be a niche question, but maybe someone can give some insights / ideas. We would like documents that are indexed to be stored in a 'staging' environment, where they are not instantly linked to an index.

Basically I'd like to persist Document objects before they are added to an index. The use case is that we want to upload a large scale of documents to our application that are ready for use, but that do not instantly need to be added to an index, as this should be done 'on the fly' . Does anyone know of a way to realize this with LLamaIndex functionality? I haven't been that up to date with the last months of developments so there's a possibility that I missed some things. Thanks in advance!
18 comments
L
O
W
Heyhey, another MongoDbAtlasVectorSearch question. I am currently using a weaviatevectorstore for local development and we want to use mongodb for production.

In order to work with multiple indices and Weaviate, I need to create a new WeaviateVectorStore and re-define the index_name whenever I am working with a different index. This is required to prevent weaviate from storing all Nodes under the same index name, which would cause indices to use all documents assigned to other indices, as they were all stored under the same index_name. This works as expected with Weaviate with little to no issues besides a bit of quirky code.

When working with MongoDBAtlas I wanted to do the same, to prevent my indices to use documents assigned to other indices. So, I am once again creating a mongodbatlasvectorsearch object, with a unique index_name. However, in the debug logs nor in the MongoDBAtlas collection viewer online can I see any trace of the unique index_name that I assigned to this vectorstore. Instead, it practically inserts a JSON representation of the Node, with seemingly no reference to the specified index_name. During query time however, I do see a reference of my specified index_name in the debug logs, where it is apparently using said index to create a query pipeline.
Query debug:
Plain Text
DEBUG:llama_index.vector_stores.mongodb:Running query pipeline: [{'$search': {'index': 'QApp_2820b774_5218_4e20_b389_0ebdb2fc4765', 'knnBeta': {'vector': [<vector>], 'path': 'embedding', 'k': 2}}}, {'$project': {'score': {'$meta': 'searchScore'}, 'embedding': 0}}]

Document insert debug:
Plain Text
DEBUG:llama_index.vector_stores.mongodb:Inserting data into MongoDB: [{'id': '8e7c7e88-25d5-4f2e-ba01-de373c0c0516', 'embedding': [<vector>], 'text': <document text>, 'metadata': {<metadata>} etc. etc. 
14 comments
O
L
Hi all, I have a list of Documents that I want to parse into nodes, and generate metadata about each node. Right now I am using the SimpleNodeParser, paired with some pre-built metadataextractors.

The question I have is regarding the SummaryExtractor. I want to create "prev" and "self" summaries for each node, to make sure that the local context of the Document is provided to the Node. However, I do not want the "prev" summary to be generated at the beginning of a new Document (referring to the first Node generated from a new Document), as this summary would refer to the last node from a previous Document (if I understand the functionality correctly), providing irrelevant context. I tried using the include_prev_next_rel, but that does not seem to resolve my issue. Should I write a custom metadata extractor for this functionality?
3 comments
O
L
Heyhey, I'm creating some custom documents right now, and I'm parsing them to nodes using get_nodes_from_document from SimpleNodeParser. How can I check which nodes come from which document? In the source code it looks like all nodes generated from a document are extended into one list. Is there any way to check which nodes came from which Document originally?
1 comment
L
Is there a way to remove readily persisted indices through the storagecontext? I can't seem to find anything about it in the documentation
8 comments
L
O
Hi all, I have this odd piece of behaviour that just doesn't make sense to me where I get an ImportError when attempting to use a SimpleDirectoryReader
28 comments
W
O
Veery minor comment about the llamaindex blog made for v 0.9. I think there's a small typo in a piece of the sample code for saving and loading ingestion pipelines from local cache. At the bottom it says
Plain Text
new_pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=25, chunk_overlap=0),
        TitleExtractor(),
    ],
    cache=new_cache,
)
# will run instantly due to the cache
nodes = pipeline.run(documents=[Document.example()])

While I'm guessing it should be
nodes = new_pipeline.run(... instead of pipeline.run(...
3 comments
O
L
Hi all, whenever I attempt to remove a document from my Index using index.delete_ref_doc(document_id, delete_from_docstore=True), it does not fully remove said document from the docstore. It seems like the docstore/metadata collection still contains an arbitrary (?) _id, as well as a doc_hash property. I checked out the mongo_docstore, mongodb_kvstore and the keyval_docstore files but cannot find out why this behaviour would occur. Any advice?
For context, I'm using a Mongodb index / docstore and a weaviate vectorindexstore

The document is properly deleted from all other places as well
57 comments
L
O
Hi all, I'm attempting to use the MongoDBAtlasVectorStore, but am running into some problems. Whenever I want to query an index that is using this VectorStore, I get the following pymongo error

Plain Text
pymongo.errors.OperationFailure: Error connecting to localhost:28000 (127.0.0.1:28000) :: caused by :: Connection refused, full error: {'ok': 0.0, 'errmsg': 'Error connecting to localhost:28000 (127.0.0.1:28000) :: caused by :: Connection refused', 'code': 6, 'codeName': 'HostUnreachable', <timestamps and metadata>}


Apparently this is because my collection in MongoDbAtlas cloud does not have a search index created. I created a search index on the default_collection, which I think is the right collection to index (as this one contains the properties: ID, embedding and text)

After creating a basic search engine I now get the following error from pymongo when querying

Plain Text
pymongo.errors.OperationFailure: embedding is not indexed as kNN, full error: {'ok': 0.0, 'errmsg': 'embedding is not indexed as kNN', 'code': 8, 'codeName': 'UnknownError' <timestamps and metadata>}


Did I miss a setup or a wiki page regarding the configuration of MongoDBAtlas, or am I supposed to manually create a search index and map it to the appropriate fields? Thanks in advance!
9 comments
O
L
Hey I'm slightly confused about Documents and Nodes, and the way that my LLM 'sees' those nodes. I am creating a custom list of Documents, which I afterwards put into a NodeParser to call get_nodes_from_documents. Afterwards I used this code to check what my LLM is seeing.

Plain Text
from llama_index.schema import MetadataMode
document = tax_nodes[12] # Random sample from nodeparser
print("The LLM sees this: \n", document.get_content(metadata_mode=MetadataMode.LLM))


The output confuses me atm (shortened for convenience)
Plain Text
The LLM sees this: 
[Excerpt from document]
Chapter: chapter II.
Article: Article 12
Paragraph: Paragraph 1
document_title: <lorem ipsum>
prev_section_summary: <lorem ipsum>
Excerpt:
----
Metadata:
----
Content: <content>
----

I don't really understand how to interpret this. The top part of the print [excerpt from document] clearly shows my metadata. But the actual heading with Metadata: remains empty. Content does contain all the text as expected.
59 comments
L
O
Hi :) I ran into a weird issue today and I'm not sure how to handle it. I created a storage_context with all 'simple' stores, SimpleDocument, -vector, -index and -graph stores. I then created a VectorStoreIndex.from_documents() with some sample documents from my SimpleDirectoryReader and assigned the storage_context. I was then able to query it as expected and retrieved normal answers. However, I then created another VectorStore, this time not providing any documents, just an empty array [] and a reference to the StorageContext (same as used in the 1st vector store). When I want to query the second VectorStore, instead of getting None as a response, I get a KeyError on one of the DocIDs of my original VectorStore. In another instance of playing around with it I created an empty VectorStoreIndex, queried it, and all of a sudden I was actually getting results from the documents assigned to the other VectorStoreIndex
56 comments
L
O
Maybe I should rephrase. I'm not even sure if this is a bug right now 😅 I using the WeaviateVectorStore from which I create a StorageContext. Then I create the GPTVectorStoreIndex from some default txt documents and attempt to query it. I assumed that using this VectorStore it would "store" the created node with the embeddings in weaviate. But when checking my objects in weaviate, the nodes are stored, but the vectorWeights in weaviate remain null. Do these need to be re-embedded every time during query-time. And if so, why?
38 comments
d
j
O
L
Definitely not an expert, but I have almost identical code right next to me, and it seems to at least work this far without having to provide an openai api key. I'd say that the code looks good. What does the rest of your code look like?
53 comments
A
O
L
I am using the GPTWeaviateIndex combined with a custom llm which I have defined in my service_context. When I am attempting to build the WeaviateIndex it still errors on the fact that I have to provide an openai API key? Am I misunderstanding how the weaviateindex works? I assumed that the fact that it has been embedded by Weaviate would be enough for an index to be created, but it turns out that OpenAI is still required for something (?)
3 comments
O
L