Find answers from the community

Updated 2 months ago

Persisted Vector Index

I have a locally persisted Vectorstore (created with VoyageAI embeddings). But every time I query it, my server makes external API calls to the VoyageAI embeddings API. It's very fast, but can someone explain why it needs to do this? I thought once I have the vectorstore kept locally there would be no need for ongoing calls to an external embeddings service.
1
v
L
w
19 comments
It depends on your code. The examples on LI docs typically only show the simplest path, which will always run the embed logic. There is another example that shows how to separate and do them conditionally.

Personally, I break them down into separate scripts or services
You need to embed the query text, so that is the embedding call being made
Then, that query embedding is being used against the existing saved vectors, to find relevant text
yeah, there is that aspect as well. I guess it depends on what @webwrx means by "calls"
is it one or N (for each doc)
Embedding/creating your index runs embeddings for each text chunk.

But once those are saved, you only need to embed each query as it comes in
right, but many of the examples don't show how to load a persisted index, the code they show will always run the document embedding process
we really need to see the code in question to understand what is happening
yea fair. Loading is fairly easy

If you are using the default vector db
Plain Text
index.storage_context.persist(persist_dir="./storage")

from llama_index import StorageContext, load_index_from_storage
index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage"))


If you are using an external db, like qdrant, pinecone, weaviate, etc.

Plain Text
vector_store = ...
index = VectorStoreIndex.from_vector_store(vector_store)


Its a little more simple in the second case, since all the nodes are stored in the vector db to simplify storage
There is an example that uses an if statement to show a local index and the two modes in one script
somewhere on the docs
yea that gets used in a few notebooks
copy-pasta style
Need to GAR examples together maybe :]
https://blog.luk.sh/rag-vs-gar
let the user check some boxes to craft custom examples
or perhaps from natural language of the example you desire... that feels more appropriate
Ahh yes this makes sense. It is only making one very quick connection with each query from what I can see. Makes sense it is embedding the query itself. Just didn't understand what's going on under the hood. Thanks for your help!
Just one I believe, as Logan M suggested, I now understand it's the query text itself. πŸ‘
Here's my code that creates it for reference:

Plain Text
def create_vectorstore(app):
  ### DATA and VECTORSTORE locations
  data_dir = 'data'
  vectorStore_dir = 'index'

  try:
    ### TRY LOADING PERSISTED INDEX ###
    storage_context = StorageContext.from_defaults(persist_dir=vectorStore_dir)
    vectorStore = load_index_from_storage(storage_context)
    logging.info("Loaded Vector Store OK.")
    return vectorStore

  except Exception as e:
    logging.info(f"Error loading Vector Store: {e}")

    try:
      ### READ and INDEX DOCS ###
      logging.info("Creating Embeddings and Vector Store...")
      documents = SimpleDirectoryReader(data_dir).load_data()
      vectorStore = VectorStoreIndex.from_documents(documents)
      logging.info("Vector Store created OK.")

      ### PERSIST INDEX TO STORAGE ###
      vectorStore.storage_context.persist(persist_dir=vectorStore_dir)
      logging.info("Vector Store persisted OK.")

      return vectorStore

    except Exception as e:
      logging.error(f"Error creating or storing Vector Store: {e}")
I am using Pinecone as my vectorstore - snippet below: would appreciate any guidance on how i can add new documents to the index and update existing documents to the index?

def get_response(user_query, user):
api_key = os.environ.get('PINECONE_API_KEY')
pc = Pinecone(api_key=api_key)

# Create a new index if it does not exist
'''
pc.create_index(
name="quickstart",
dimension=1536,
metric="euclidean",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
'''
pinecone_index = pc.Index("quickstart")

# Load documents from the specified directory
documents = SimpleDirectoryReader("./knowledge").load_data()

# Create an index from the documents
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# reload an existing one
index = VectorStoreIndex.from_vector_store(vector_store=vector_store)

# Create a query engine from the index
model = "gpt-4o"
Settings.llm = llamaOpenAI(temperature=0, model=model)
query_engine = index.as_query_engine()
...
@Logan M could you pls guide - would really appreciate it.
Add a reply
Sign up and join the conversation on Discord