OverclockedClock

Hey all we are using llama-index version

Hey all we are using llama-index version 0.9.5 for an older project and we were wondering if there's a functionality in this version to query two different vectorstoreindices, and ensure that nodes from both are used to synthesize an answer. I can't find the queryfusionretriever in this verison of llama-index, and the closest I can find is building a custom retriever. Any ideas / approaches to tackle this issue?

5 comments

OOverclockedClock

Hey All, I've been working with the

Hey All, I've been working with the NebulaGraphStore, and for some reason my code seems to append a rogue comma (,) to my edge parameter?

I have a full on default NebulaGraph setup, in which I've followed the llamaindex docs. I have the following atm

8 comments

OOverclockedClock

Might be a niche question, but maybe

Might be a niche question, but maybe someone can give some insights / ideas. We would like documents that are indexed to be stored in a 'staging' environment, where they are not instantly linked to an index.

Basically I'd like to persist Document objects before they are added to an index. The use case is that we want to upload a large scale of documents to our application that are ready for use, but that do not instantly need to be added to an index, as this should be done 'on the fly' . Does anyone know of a way to realize this with LLamaIndex functionality? I haven't been that up to date with the last months of developments so there's a possibility that I missed some things. Thanks in advance!

18 comments

OOverclockedClock

Heyhey another MongoDbAtlasVectorSearch

Heyhey, another MongoDbAtlasVectorSearch question. I am currently using a weaviatevectorstore for local development and we want to use mongodb for production.

In order to work with multiple indices and Weaviate, I need to create a new WeaviateVectorStore and re-define the index_name whenever I am working with a different index. This is required to prevent weaviate from storing all Nodes under the same index name, which would cause indices to use all documents assigned to other indices, as they were all stored under the same index_name. This works as expected with Weaviate with little to no issues besides a bit of quirky code.

When working with MongoDBAtlas I wanted to do the same, to prevent my indices to use documents assigned to other indices. So, I am once again creating a mongodbatlasvectorsearch object, with a unique index_name. However, in the debug logs nor in the MongoDBAtlas collection viewer online can I see any trace of the unique index_name that I assigned to this vectorstore. Instead, it practically inserts a JSON representation of the Node, with seemingly no reference to the specified index_name. During query time however, I do see a reference of my specified index_name in the debug logs, where it is apparently using said index to create a query pipeline.
Query debug:

Plain Text

DEBUG:llama_index.vector_stores.mongodb:Running query pipeline: [{'$search': {'index': 'QApp_2820b774_5218_4e20_b389_0ebdb2fc4765', 'knnBeta': {'vector': [<vector>], 'path': 'embedding', 'k': 2}}}, {'$project': {'score': {'$meta': 'searchScore'}, 'embedding': 0}}]

Document insert debug:

Plain Text

DEBUG:llama_index.vector_stores.mongodb:Inserting data into MongoDB: [{'id': '8e7c7e88-25d5-4f2e-ba01-de373c0c0516', 'embedding': [<vector>], 'text': <document text>, 'metadata': {<metadata>} etc. etc.

14 comments

OOverclockedClock

Hi all I have a list of Documents that I

Hi all, I have a list of Documents that I want to parse into nodes, and generate metadata about each node. Right now I am using the SimpleNodeParser, paired with some pre-built metadataextractors.

The question I have is regarding the SummaryExtractor. I want to create "prev" and "self" summaries for each node, to make sure that the local context of the Document is provided to the Node. However, I do not want the "prev" summary to be generated at the beginning of a new Document (referring to the first Node generated from a new Document), as this summary would refer to the last node from a previous Document (if I understand the functionality correctly), providing irrelevant context. I tried using the include_prev_next_rel, but that does not seem to resolve my issue. Should I write a custom metadata extractor for this functionality?

3 comments

OOverclockedClock

Heyhey I m creating some custom

Heyhey, I'm creating some custom documents right now, and I'm parsing them to nodes using get_nodes_from_document from SimpleNodeParser. How can I check which nodes come from which document? In the source code it looks like all nodes generated from a document are extended into one list. Is there any way to check which nodes came from which Document originally?

1 comment

OOverclockedClock

Is there a way to remove readily

Is there a way to remove readily persisted indices through the storagecontext? I can't seem to find anything about it in the documentation

8 comments

OOverclockedClock

Hi all, I have this odd piece of

Hi all, I have this odd piece of behaviour that just doesn't make sense to me where I get an ImportError when attempting to use a SimpleDirectoryReader

28 comments

OOverclockedClock

Veery minor comment about the llamaindex

Veery minor comment about the llamaindex blog made for v 0.9. I think there's a small typo in a piece of the sample code for saving and loading ingestion pipelines from local cache. At the bottom it says

Plain Text

new_pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=25, chunk_overlap=0),
        TitleExtractor(),
    ],
    cache=new_cache,
)
# will run instantly due to the cache
nodes = pipeline.run(documents=[Document.example()])

While I'm guessing it should be
nodes = new_pipeline.run(... instead of pipeline.run(...

3 comments

OOverclockedClock

Hi all whenever I attempt to remove a

Hi all, whenever I attempt to remove a document from my Index using index.delete_ref_doc(document_id, delete_from_docstore=True), it does not fully remove said document from the docstore. It seems like the docstore/metadata collection still contains an arbitrary (?) _id, as well as a doc_hash property. I checked out the mongo_docstore, mongodb_kvstore and the keyval_docstore files but cannot find out why this behaviour would occur. Any advice?
For context, I'm using a Mongodb index / docstore and a weaviate vectorindexstore

The document is properly deleted from all other places as well

57 comments

OOverclockedClock

Hi all I m attempting to use the

Hi all, I'm attempting to use the MongoDBAtlasVectorStore, but am running into some problems. Whenever I want to query an index that is using this VectorStore, I get the following pymongo error

Plain Text

pymongo.errors.OperationFailure: Error connecting to localhost:28000 (127.0.0.1:28000) :: caused by :: Connection refused, full error: {'ok': 0.0, 'errmsg': 'Error connecting to localhost:28000 (127.0.0.1:28000) :: caused by :: Connection refused', 'code': 6, 'codeName': 'HostUnreachable', <timestamps and metadata>}

Apparently this is because my collection in MongoDbAtlas cloud does not have a search index created. I created a search index on the default_collection, which I think is the right collection to index (as this one contains the properties: ID, embedding and text)

After creating a basic search engine I now get the following error from pymongo when querying

Plain Text

pymongo.errors.OperationFailure: embedding is not indexed as kNN, full error: {'ok': 0.0, 'errmsg': 'embedding is not indexed as kNN', 'code': 8, 'codeName': 'UnknownError' <timestamps and metadata>}

Did I miss a setup or a wiki page regarding the configuration of MongoDBAtlas, or am I supposed to manually create a search index and map it to the appropriate fields? Thanks in advance!

9 comments

OOverclockedClock

Hey I m slightly confused about

Hey I'm slightly confused about Documents and Nodes, and the way that my LLM 'sees' those nodes. I am creating a custom list of Documents, which I afterwards put into a NodeParser to call get_nodes_from_documents. Afterwards I used this code to check what my LLM is seeing.

Plain Text

from llama_index.schema import MetadataMode
document = tax_nodes[12] # Random sample from nodeparser
print("The LLM sees this: \n", document.get_content(metadata_mode=MetadataMode.LLM))

The output confuses me atm (shortened for convenience)

Plain Text

The LLM sees this: 
[Excerpt from document]
Chapter: chapter II.
Article: Article 12
Paragraph: Paragraph 1
document_title: <lorem ipsum>
prev_section_summary: <lorem ipsum>
Excerpt:
----
Metadata:
----
Content: <content>
----

I don't really understand how to interpret this. The top part of the print [excerpt from document] clearly shows my metadata. But the actual heading with Metadata: remains empty. Content does contain all the text as expected.

59 comments

OOverclockedClock

Hi I ran into a weird issue today and I

Hi :) I ran into a weird issue today and I'm not sure how to handle it. I created a storage_context with all 'simple' stores, SimpleDocument, -vector, -index and -graph stores. I then created a VectorStoreIndex.from_documents() with some sample documents from my SimpleDirectoryReader and assigned the storage_context. I was then able to query it as expected and retrieved normal answers. However, I then created another VectorStore, this time not providing any documents, just an empty array [] and a reference to the StorageContext (same as used in the 1st vector store). When I want to query the second VectorStore, instead of getting None as a response, I get a KeyError on one of the DocIDs of my original VectorStore. In another instance of playing around with it I created an empty VectorStoreIndex, queried it, and all of a sudden I was actually getting results from the documents assigned to the other VectorStoreIndex

56 comments

OOverclockedClock

Weaviate 0.6.0

Maybe I should rephrase. I'm not even sure if this is a bug right now 😅 I using the WeaviateVectorStore from which I create a StorageContext. Then I create the GPTVectorStoreIndex from some default txt documents and attempt to query it. I assumed that using this VectorStore it would "store" the created node with the embeddings in weaviate. But when checking my objects in weaviate, the nodes are stored, but the vectorWeights in weaviate remain null. Do these need to be re-embedded every time during query-time. And if so, why?

38 comments

OOverclockedClock

Accidentally using OpenAI API

Definitely not an expert, but I have almost identical code right next to me, and it seems to at least work this far without having to provide an openai api key. I'd say that the code looks good. What does the rest of your code look like?

53 comments

OOverclockedClock

I am using the GPTWeaviateIndex combined

I am using the GPTWeaviateIndex combined with a custom llm which I have defined in my service_context. When I am attempting to build the WeaviateIndex it still errors on the fact that I have to provide an openai API key? Am I misunderstanding how the weaviateindex works? I assumed that the fact that it has been embedded by Weaviate would be enough for an index to be created, but it turns out that OpenAI is still required for something (?)

3 comments

Find answers from the community

Hey all we are using llama-index version

Hey All, I've been working with the

Might be a niche question, but maybe

Heyhey another MongoDbAtlasVectorSearch

Hi all I have a list of Documents that I

Heyhey I m creating some custom

Is there a way to remove readily

Hi all, I have this odd piece of

Veery minor comment about the llamaindex

Hi all whenever I attempt to remove a

Hi all I m attempting to use the

Hey I m slightly confused about

Hi I ran into a weird issue today and I

Weaviate 0.6.0

Accidentally using OpenAI API

I am using the GPTWeaviateIndex combined