i'd say dont rely on llamaindex. Do this-
1) Make a chromadb (preferred ) or qdrant vector collection locally.
2) Use any embeddings and chunking size based on that to make the vector records for db.
3) load the vector in app with something like this-
def get_vector_store(client):
vector_store = QdrantVectorStore(
client=client, collection_name=""
)
return vector_store
storage_context = StorageContext.from_defaults(vector_store=vector_store)
embed_model = # your embedding model used to create vector db
index =
VectorStoreIndex.from_vector_store(storage_context=storage_context, embed_model=embed_model)
you can further query the vector store with chat_engine using OpenAI. Otherwise I'd recommend using standalone code for retrieving documents and passing it OAI api