Find answers from the community

Updated 9 months ago

I wanted to initialzie an empty

I wanted to initialzie an empty VectorStoreIndex by providing just a storage_context.

Plain Text
 self.base_index = VectorStoreIndex(
                    nodes=None, storage_context=self.storage_context
                )


But I'm getting an error:

Plain Text
An error occurred: One of nodes, objects, or index_struct must be provided.


Considering I would be inserting the nodes later on what would be the correct way of initialisation and reloading of indices?
b
L
20 comments
My docstore and the storage context look something like:

Plain Text
self.docstore = RedisDocumentStore.from_redis_client(
                    redis_client=self.redis_client,
                    namespace=self.namespace
                )
                self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )
Plain Text
self.base_index = VectorStoreIndex(
    nodes=[], storage_context=self.storage_context
)
I think that will work?
@Logan M thank you been struggling with this since some time. This works !! πŸ™Œ
Nice! πŸ‘
This doesn't seem to load the indexes already present in the redis storage context
is there a way around this?
If you have stuff in redis, you should use load_index_from_storage(storage_context)
my code seems to working fine the first time but every time fails subsequently

Plain Text
self.redis_client = redis.Redis(
                    host=self.config.get("UPSTASH_REDIS_HOST"),
                    port=self.config.get("UPSTASH_REDIS_PORT"),
                    password=self.config.get("UPSTASH_REDIS_PASSWORD"),
                    ssl=True,
                )
                self.docstore = RedisDocumentStore.from_redis_client(
                    redis_client=self.redis_client,
                    namespace=self.namespace
                )
                self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )
                self.base_index = VectorStoreIndex(
                    nodes=[],
                    storage_context=self.storage_context,
                )
                self.base_retriever = self.base_index.as_retriever(
                    similarity_top_k=self.similarity_top_k
                )
               
               
                
                try:
                    # Load all indices
                    indices = load_indices_from_storage(self.storage_context)

                    # Print out the index_ids of all loaded indices
                    for index in indices:
                        print(index.index_id)
                    self.base_index = load_index_from_storage(self.storage_context)
                    print("[INFO] Index found at storage")
                except ValueError as e:
                    print("[INFO] No index found at storage")
it seems to be add a new index everytime I run it
every time you run this sequence of code, it will indeed create another index.

You need to run self.base_index = VectorStoreIndex(...) only once. And after its created, using load_index_from_storage()
You can also explicitly set the index id

index.set_index_id("my_index")
I've also noticed something else, my indices seem to be loaded but for some reason as soon as I start my service , it does not seem to work again:

Plain Text
def process_fetch_query_results(
        self, query="", similarity_top_k_reranker=3
    ):
        try:
            print(self.base_index)
            self.base_retriever = self.base_index.as_retriever(
                similarity_top_k=self.similarity_top_k
            )
            self.retriever = AutoMergingRetriever(
                self.base_retriever, self.storage_context, verbose=True
            )
            self.postprocessor = SentenceTransformerRerank(
                model="cross-encoder/ms-marco-MiniLM-L-2-v2",
                top_n=similarity_top_k_reranker,
            )
            query_bundle = QueryBundle(query_str=query)
            print("******Query***********",query)
            retrived_nodes = self.retriever.retrieve(query_bundle)
            print("******base Retriever***********",self.base_retriever.retrieve(query_bundle))
            print("******Retrieved Nodes*******", retrived_nodes)
            rerank_nodes = self.postprocessor.postprocess_nodes(
                nodes=retrived_nodes, query_bundle=query_bundle
            )
            return rerank_nodes
        except Exception as e:
            raise Exception(f"An error occurred retrieving: {e}")
the output is something like:

Plain Text
[INFO] Loading LLamaIndex pre-reqs..
37e25b6d-ce32-4c94-b8ff-106e05a31128
[INFO] Index found at storage

and the function output is: 

******Query*********** Rahul
******base Retriever*********** []
******Retrieved Nodes******* []
you did not save the vector store anywhere πŸ‘€
so on reload, there are no vectors to retrieve
Plain Text
        try:
            returned_filename, detected_text = self.return_pdf_text(
                file=uploaded_file,
                use_unstructured=True,
                from_path=False,
                filename=filename,
                strategy=strategy,
            )
            documents = self.create_document(text=detected_text, filename=filename)
            nodes = self.pipeline.run(
                documents=documents, show_progress=True
            )
            self.add_nodes_to_doc_store(all_nodes=nodes)
            leaf_nodes = get_leaf_nodes(nodes)
            self.base_index.insert_nodes(leaf_nodes)

        except Exception as e:
            raise Exception(f"An error occurred when running ingestion pipeline: {e}")

this was the ingestion pipeline
does this not push the index to vector store?
and the storage context was:

Plain Text
self.storage_context = StorageContext.from_defaults(
                    docstore=self.docstore,
                    index_store=RedisIndexStore.from_redis_client(
                        redis_client=self.redis_client,
                        namespace=self.namespace
                    ),
                )
Plain Text
self.base_index.insert_nodes(leaf_nodes)


Should insert the nodes and then I should be able to reload it up correct?
Add a reply
Sign up and join the conversation on Discord