I wanted to initialzie an empty

At a glance

The community member is trying to initialize an empty VectorStoreIndex by providing just a storage_context, but is getting an error that one of nodes, objects, or index_struct must be provided. The community member is considering inserting the nodes later on and wants to know the correct way to initialize and reload the indices.

The comments suggest that the community member should use load_index_from_storage(storage_context) to load the indexes already present in the Redis storage context. However, the community member's code seems to be adding a new index every time it is run, and the indices are not working as expected after the service is started.

The community members suggest that the community member should only run self.base_index = VectorStoreIndex(...) once, and then use load_index_from_storage() to reload the index. They also suggest explicitly setting the index ID using index.set_index_id("my_index").

The community members also note that the community member is not saving the vector store anywhere, so on reload, there are no vectors to retrieve. They suggest that the

bbeaverTango

I wanted to initialzie an empty VectorStoreIndex by providing just a storage_context.

Plain Text

 self.base_index = VectorStoreIndex(
                    nodes=None, storage_context=self.storage_context
                )

But I'm getting an error:

Plain Text

An error occurred: One of nodes, objects, or index_struct must be provided.

Considering I would be inserting the nodes later on what would be the correct way of initialisation and reloading of indices?

20 comments

bbeaverTango

My docstore and the storage context look something like:

Plain Text

self.docstore = RedisDocumentStore.from_redis_client(
                    redis_client=self.redis_client,
                    namespace=self.namespace
                )
                self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )

LLogan M

Plain Text

self.base_index = VectorStoreIndex(
    nodes=[], storage_context=self.storage_context
)

LLogan M

I think that will work?

bbeaverTango

@Logan M thank you been struggling with this since some time. This works !! 🙌

LLogan M

Nice! 👍

bbeaverTango

This doesn't seem to load the indexes already present in the redis storage context

bbeaverTango

is there a way around this?

LLogan M

If you have stuff in redis, you should use load_index_from_storage(storage_context)

bbeaverTango

my code seems to working fine the first time but every time fails subsequently

Plain Text

self.redis_client = redis.Redis(
                    host=self.config.get("UPSTASH_REDIS_HOST"),
                    port=self.config.get("UPSTASH_REDIS_PORT"),
                    password=self.config.get("UPSTASH_REDIS_PASSWORD"),
                    ssl=True,
                )
                self.docstore = RedisDocumentStore.from_redis_client(
                    redis_client=self.redis_client,
                    namespace=self.namespace
                )
                self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )
                self.base_index = VectorStoreIndex(
                    nodes=[],
                    storage_context=self.storage_context,
                )
                self.base_retriever = self.base_index.as_retriever(
                    similarity_top_k=self.similarity_top_k
                )
               
               
                
                try:
                    # Load all indices
                    indices = load_indices_from_storage(self.storage_context)

                    # Print out the index_ids of all loaded indices
                    for index in indices:
                        print(index.index_id)
                    self.base_index = load_index_from_storage(self.storage_context)
                    print("[INFO] Index found at storage")
                except ValueError as e:
                    print("[INFO] No index found at storage")

bbeaverTango

it seems to be add a new index everytime I run it

LLogan M

every time you run this sequence of code, it will indeed create another index.

You need to run self.base_index = VectorStoreIndex(...) only once. And after its created, using load_index_from_storage()

LLogan M

You can also explicitly set the index id

index.set_index_id("my_index")

bbeaverTango

I've also noticed something else, my indices seem to be loaded but for some reason as soon as I start my service , it does not seem to work again:

Plain Text

def process_fetch_query_results(
        self, query="", similarity_top_k_reranker=3
    ):
        try:
            print(self.base_index)
            self.base_retriever = self.base_index.as_retriever(
                similarity_top_k=self.similarity_top_k
            )
            self.retriever = AutoMergingRetriever(
                self.base_retriever, self.storage_context, verbose=True
            )
            self.postprocessor = SentenceTransformerRerank(
                model="cross-encoder/ms-marco-MiniLM-L-2-v2",
                top_n=similarity_top_k_reranker,
            )
            query_bundle = QueryBundle(query_str=query)
            print("******Query***********",query)
            retrived_nodes = self.retriever.retrieve(query_bundle)
            print("******base Retriever***********",self.base_retriever.retrieve(query_bundle))
            print("******Retrieved Nodes*******", retrived_nodes)
            rerank_nodes = self.postprocessor.postprocess_nodes(
                nodes=retrived_nodes, query_bundle=query_bundle
            )
            return rerank_nodes
        except Exception as e:
            raise Exception(f"An error occurred retrieving: {e}")

bbeaverTango

the output is something like:

Plain Text

[INFO] Loading LLamaIndex pre-reqs..
37e25b6d-ce32-4c94-b8ff-106e05a31128
[INFO] Index found at storage

and the function output is: 

******Query*********** Rahul
******base Retriever*********** []
******Retrieved Nodes******* []

LLogan M

you did not save the vector store anywhere 👀

LLogan M

so on reload, there are no vectors to retrieve

bbeaverTango

Plain Text

        try:
            returned_filename, detected_text = self.return_pdf_text(
                file=uploaded_file,
                use_unstructured=True,
                from_path=False,
                filename=filename,
                strategy=strategy,
            )
            documents = self.create_document(text=detected_text, filename=filename)
            nodes = self.pipeline.run(
                documents=documents, show_progress=True
            )
            self.add_nodes_to_doc_store(all_nodes=nodes)
            leaf_nodes = get_leaf_nodes(nodes)
            self.base_index.insert_nodes(leaf_nodes)

        except Exception as e:
            raise Exception(f"An error occurred when running ingestion pipeline: {e}")

this was the ingestion pipeline

bbeaverTango

does this not push the index to vector store?

bbeaverTango

and the storage context was:

Plain Text

self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )

bbeaverTango

Plain Text

self.base_index.insert_nodes(leaf_nodes)

Should insert the nodes and then I should be able to reload it up correct?

Add a reply

Find answers from the community

I wanted to initialzie an empty