Find answers from the community

Updated last year

I wanted to initialzie an empty

At a glance

The community member is trying to initialize an empty VectorStoreIndex by providing just a storage_context, but is getting an error that one of nodes, objects, or index_struct must be provided. The community member is considering inserting the nodes later on and wants to know the correct way to initialize and reload the indices.

The comments suggest that the community member should use load_index_from_storage(storage_context) to load the indexes already present in the Redis storage context. However, the community member's code seems to be adding a new index every time it is run, and the indices are not working as expected after the service is started.

The community members suggest that the community member should only run self.base_index = VectorStoreIndex(...) once, and then use load_index_from_storage() to reload the index. They also suggest explicitly setting the index ID using index.set_index_id("my_index").

The community members also note that the community member is not saving the vector store anywhere, so on reload, there are no vectors to retrieve. They suggest that the

I wanted to initialzie an empty VectorStoreIndex by providing just a storage_context.

Plain Text
 self.base_index = VectorStoreIndex(
                    nodes=None, storage_context=self.storage_context
                )


But I'm getting an error:

Plain Text
An error occurred: One of nodes, objects, or index_struct must be provided.


Considering I would be inserting the nodes later on what would be the correct way of initialisation and reloading of indices?
b
L
20 comments
My docstore and the storage context look something like:

Plain Text
self.docstore = RedisDocumentStore.from_redis_client(
                    redis_client=self.redis_client,
                    namespace=self.namespace
                )
                self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )
Plain Text
self.base_index = VectorStoreIndex(
    nodes=[], storage_context=self.storage_context
)
I think that will work?
@Logan M thank you been struggling with this since some time. This works !! πŸ™Œ
Nice! πŸ‘
This doesn't seem to load the indexes already present in the redis storage context
is there a way around this?
If you have stuff in redis, you should use load_index_from_storage(storage_context)
my code seems to working fine the first time but every time fails subsequently

Plain Text
self.redis_client = redis.Redis(
                    host=self.config.get("UPSTASH_REDIS_HOST"),
                    port=self.config.get("UPSTASH_REDIS_PORT"),
                    password=self.config.get("UPSTASH_REDIS_PASSWORD"),
                    ssl=True,
                )
                self.docstore = RedisDocumentStore.from_redis_client(
                    redis_client=self.redis_client,
                    namespace=self.namespace
                )
                self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )
                self.base_index = VectorStoreIndex(
                    nodes=[],
                    storage_context=self.storage_context,
                )
                self.base_retriever = self.base_index.as_retriever(
                    similarity_top_k=self.similarity_top_k
                )
               
               
                
                try:
                    # Load all indices
                    indices = load_indices_from_storage(self.storage_context)

                    # Print out the index_ids of all loaded indices
                    for index in indices:
                        print(index.index_id)
                    self.base_index = load_index_from_storage(self.storage_context)
                    print("[INFO] Index found at storage")
                except ValueError as e:
                    print("[INFO] No index found at storage")
it seems to be add a new index everytime I run it
every time you run this sequence of code, it will indeed create another index.

You need to run self.base_index = VectorStoreIndex(...) only once. And after its created, using load_index_from_storage()
You can also explicitly set the index id

index.set_index_id("my_index")
I've also noticed something else, my indices seem to be loaded but for some reason as soon as I start my service , it does not seem to work again:

Plain Text
def process_fetch_query_results(
        self, query="", similarity_top_k_reranker=3
    ):
        try:
            print(self.base_index)
            self.base_retriever = self.base_index.as_retriever(
                similarity_top_k=self.similarity_top_k
            )
            self.retriever = AutoMergingRetriever(
                self.base_retriever, self.storage_context, verbose=True
            )
            self.postprocessor = SentenceTransformerRerank(
                model="cross-encoder/ms-marco-MiniLM-L-2-v2",
                top_n=similarity_top_k_reranker,
            )
            query_bundle = QueryBundle(query_str=query)
            print("******Query***********",query)
            retrived_nodes = self.retriever.retrieve(query_bundle)
            print("******base Retriever***********",self.base_retriever.retrieve(query_bundle))
            print("******Retrieved Nodes*******", retrived_nodes)
            rerank_nodes = self.postprocessor.postprocess_nodes(
                nodes=retrived_nodes, query_bundle=query_bundle
            )
            return rerank_nodes
        except Exception as e:
            raise Exception(f"An error occurred retrieving: {e}")
the output is something like:

Plain Text
[INFO] Loading LLamaIndex pre-reqs..
37e25b6d-ce32-4c94-b8ff-106e05a31128
[INFO] Index found at storage

and the function output is: 

******Query*********** Rahul
******base Retriever*********** []
******Retrieved Nodes******* []
you did not save the vector store anywhere πŸ‘€
so on reload, there are no vectors to retrieve
Plain Text
        try:
            returned_filename, detected_text = self.return_pdf_text(
                file=uploaded_file,
                use_unstructured=True,
                from_path=False,
                filename=filename,
                strategy=strategy,
            )
            documents = self.create_document(text=detected_text, filename=filename)
            nodes = self.pipeline.run(
                documents=documents, show_progress=True
            )
            self.add_nodes_to_doc_store(all_nodes=nodes)
            leaf_nodes = get_leaf_nodes(nodes)
            self.base_index.insert_nodes(leaf_nodes)

        except Exception as e:
            raise Exception(f"An error occurred when running ingestion pipeline: {e}")

this was the ingestion pipeline
does this not push the index to vector store?
and the storage context was:

Plain Text
self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )
Plain Text
self.base_index.insert_nodes(leaf_nodes)


Should insert the nodes and then I should be able to reload it up correct?
Add a reply
Sign up and join the conversation on Discord