Why does the RAG pipeline presented in most documentation examples seems wrong?
I've been studying these various frameworks like llama-index and langchain recently. I've noticed that all the examples they give are like this:
read a document from a folder/source -> create a "vector store" based on the documents -> querying the llm using the context of the created vector store
In most cases, we have to generate embeddings of the documents (or nodes) when ingesting them inside the vector store. Isn't this approach kind of not optimal? Because when the application is off, I'd have to generate the embeddings of the same documents over and over again, generating waste of resources.
Note that I didn't send any piece of code here, I'm talking about the general workflow.
I've found a way to do the ingestion phase and the QA phase separately in langchain. I'm now migrating this method to Llamaindex, which I think will be more optimal because I really liked what llamaparse does for better parsing documents. The ingestion is done separately from the QA and all of them are asynchronous processes.
Is this process presented by most documentations really wrong or am I missing something? I'm still studying the documentations but slowly because this is not my priority right now.
But when I create an index, for example, with the from_documents method, I'd have to generate the embeddings again or check, in the next call, if it wasn't already ingested
This way I can separate ingestion from querying and not creating a vector index with a from_documents method. I can even do the question and answer stuff with langchain, if I specify the same collection and make the query embedding with the same embedding model
Yea its fine to do it this way if you want. If you already populated the vector store, you can just do index = VectorStoreIndex.from_vector_store(vector_store)
from_documents() does all the work you are doing above.
If you want, you can also use an ingestion pipeline with a vector store and docstore attached, to perform some document management (if a document with the same ID is ingested, its checked if the content is the same and upserted or skipped)