I have my prototype setup for a product

At a glance

The community member has a prototype setup for a product and wants to scale it up in Google Cloud. They are looking for tutorials or notebooks to help with parallelizing the process of building a vector index using Google Cloud Functions or an alternative. The main issue is that the VectorStoreIndex.from_documents() function takes a long time and sometimes times out due to Cloud Function problems.

The community members ask the assistant for help with finding tutorials on using Google Cloud Functions to create a vector index, as well as tutorials on parallelizing vector index building and embedding retrieval, and storing them in Pinecone. The assistant provides some general guidance on setting up Pinecone and using LlamaIndex to build and store the vector index, but notes that there are no specific tutorials covering the parallelization aspect.

The community member then raises an issue where the VectorStoreIndex.from_documents() function only makes 5 requests instead of the expected 337 requests based on the number of documents. The assistant suggests that this could be due to various factors, such as batch processing, document preprocessing, or configuration parameters, but without more details on the implementation, it's difficult to pinpoint the exact reason.

The community member also asks if there are any commands or methods to retrieve information about the shape, size, or general details of the vector index

AAndre Tättar

I have my prototype setup for a product, I want to start scaling it up in Google Cloud. Are there any tutorials/notebooks/anything to help me? My current setup is simple:

loader = SimpleDirectoryReader(self.LOCAL_TEMP_DIR, recursive=True, exclude_hidden=True)
documents = loader.load_data()
vector_index = VectorStoreIndex.from_documents(documents)

I'd like to make everything here parallel by using Google CloudFunctions or some alternative. I'd also want to start using vector index databases (any recommendations in GCP?).
VectorStoreIndex takes hours to do sometimes and it gets timed out due to cloudfunction problems (since I work on all the documents in one thread). That is the main reason I want to parallelize it (currently my prototype stores everything on disk). Also - I have one vector index per company, and I have lots of companies, so the vector index must be able to handle that use case (query engine cannot use another companies data)

17 comments

AAndre Tättar

@kapa.ai can you help with this?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
I'm sorry, but your question is quite vague. Could you please provide more details or context? I'm here to assist with information related to LlamaIndex.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

AAndre Tättar

@kapa.ai Are there tutorials which cover how to use google cloudfunctions to create a vector index?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
I'm sorry, but based on the information provided in the extracts, there are no specific tutorials or examples that cover how to use Google Cloud Functions to create a vector index with LlamaIndex. The extracts provide information on various modules, vector stores, and examples of using LlamaIndex with different databases and stores such as Elasticsearch, Tencent Cloud VectorDB, and others, but Google Cloud Functions is not mentioned.

For more detailed information and tutorials, you may want to check the official LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

AAndre Tättar

@kapa.ai How about tutorials how parallelize vector index building and embedding retrieval and storing them in pinecone?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
Based on the provided extracts, there are several tutorials and examples that guide you through the process of building a vector index, retrieving embeddings, and storing them in Pinecone using LlamaIndex. However, explicit information on parallelizing these processes is not provided in the extracts. Here's a general guide on how to proceed:

Initialize Pinecone and create an index:
You need to initialize Pinecone with your API key and environment, then create an index with the desired parameters. Note that you should not save your API keys in the code or add pinecone_env to your repo for security reasons. Here's an example of how to do this:

Plain Text

   import pinecone
   import os

   api_key = os.environ["PINECONE_API_KEY"]
   environment = os.environ["PINECONE_ENVIRONMENT"]
   pinecone.init(api_key=api_key, environment=environment)

   index_name = "llamaindex-rag-fs"
   pinecone.create_index(index_name, dimension=1536, metric="euclidean", pod_type="p1")
   pinecone_index = pinecone.Index(index_name)

Source: LlamaIndex > En > Latest > Examples > Low_level > OpenAI > Setup ()

Create a PineconeVectorStore:
This is a simple wrapper abstraction to use in LlamaIndex. You can create it as follows:

Plain Text

   from llama_index.vector_stores import PineconeVectorStore
   vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

Source: LlamaIndex > En > Latest > Examples > Low_level > OpenAI > Setup ()

Load documents into the PineconeVectorStore:
You can load documents into the PineconeVectorStore using high-level ingestion abstractions with VectorStoreIndex.from_documents. Here's an example:

kkapa.ai

Plain Text

   from llama_index.core import VectorStoreIndex
   from llama_index.core.node_parser import SentenceSplitter
   from llama_index.core import StorageContext

   splitter = SentenceSplitter(chunk_size=1024)
   storage_context = StorageContext.from_defaults(vector_store=vector_store)
   index = VectorStoreIndex.from_documents(
       documents, transformations=[splitter], storage_context=storage_context
   )

Source: Jerryjliu > Llama_index > Blob > Main > Docs > Examples > Low_level > Building Data Ingestion from Scratch > Setup ()

Retrieve embeddings:
You can retrieve embeddings by creating a retriever from the index as follows:

Plain Text

   retriever = index.as_retriever()

Source: Jerryjliu > Llama_index > Blob > Main > Docs > Examples > Low_level > Building Response Synthesis from Scratch > Setup > Build Pinecone Index, Get Retriever ()

Remember, these are just general steps. For more detailed information and tutorials, you may want to check the official LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

AAndre Tättar

@kapa.ai I load my documents in with code:
loader = SimpleDirectoryReader(inst.LOCAL_TEMP_DIR, recursive=True, exclude_hidden=True)
documents = loader.load_data()

It shows that I have 337 documents.

When I run the index building code now:

vector_index = VectorStoreIndex.from_documents(documents)

Then it only does 5 requests. Shouldn't it make 337 requests instead of 5 requests? What could be wrong?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
Based on the information provided in the extracts, it's important to understand how VectorStoreIndex.from_documents functions and what might influence the number of requests made during the indexing process. The extracts do not directly address the specific scenario of loading documents and the expected number of requests during the index building process. However, we can infer some possibilities based on the general behavior of indexing and document processing in systems like LlamaIndex.

When you use VectorStoreIndex.from_documents(documents), the process involves analyzing and possibly transforming the documents before they are indexed. The number of requests made to the underlying service (like an LLM or a vector store) can depend on several factors:

Batch Processing: If the indexing process supports batch processing, multiple documents might be processed in a single request. This is common in systems designed to handle large volumes of data efficiently.

Document Preprocessing: Depending on the preprocessing steps (e.g., tokenization, embedding extraction), not every document might result in a separate request. Preprocessing could aggregate or filter data in a way that reduces the number of necessary requests.

Configuration Parameters: Certain configuration parameters might influence the indexing behavior, such as similarity_top_k or chunk_size, as mentioned in other contexts within the extracts. These parameters could affect how data is batched or processed, thus influencing the number of requests.

Error or Limitation: There could be an error in how documents are being processed or a limitation in the system that prevents all documents from being indexed as expected. This might require reviewing the system's logs or documentation for troubleshooting steps.

kkapa.ai

Without specific details on the implementation of VectorStoreIndex.from_documents and how it handles document processing and request batching, it's challenging to pinpoint the exact reason for the discrepancy. It's also possible that the system optimizes request handling in a way that doesn't directly correlate the number of documents with the number of requests.

For a more detailed understanding and troubleshooting, reviewing the official LlamaIndex documentation () and seeking support from the community or the development team would be advisable.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

AAndre Tättar

@kapa.ai can i find out the shape and size of the vector index created by
vector_index = VectorStoreIndex.from_documents(documents) ? Are there any size/shape/len commands to give me embeddings sizes and general info about the vector index?

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

----------------------
Based on the extracts provided, there isn't a direct mention of specific commands or methods within the LlamaIndex API to retrieve the shape, size, or general information about the vector index created by VectorStoreIndex.from_documents(documents). The documentation and examples focus on the creation, querying, and management of vector stores and indices without detailing introspection or metadata retrieval functionalities.

However, in similar systems or libraries, it's common to have methods or properties that allow you to inspect the dimensions of the embeddings, the number of documents indexed, or other metadata about the index. These might include properties or methods like .shape, .size(), or .info().

For LlamaIndex, you would typically expect to find such functionality documented in the API reference or user guide. Since this information is not available in the provided extracts, I recommend consulting the official LlamaIndex documentation () for more detailed information on index introspection and management capabilities. If LlamaIndex supports such features, the documentation is the most likely place to find how to use them.

If direct introspection methods are not available, an alternative approach might involve tracking the size and shape of your embeddings and documents externally as you build the index. This could involve keeping count of the documents and noting the dimensions of the embeddings you're using, though this approach requires manual management and might not reflect the internal state of the index accurately if any transformations or filtering occur during indexing.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

Add a reply

Find answers from the community

I have my prototype setup for a product