Find answers from the community

Updated 8 months ago

Why does the RAG pipeline presented in

Why does the RAG pipeline presented in most documentation examples seems wrong?


I've been studying these various frameworks like llama-index and langchain recently. I've noticed that all the examples they give are like this:


read a document from a folder/source -> create a "vector store" based on the documents -> querying the llm using the context of the created vector store


In most cases, we have to generate embeddings of the documents (or nodes) when ingesting them inside the vector store. Isn't this approach kind of not optimal? Because when the application is off, I'd have to generate the embeddings of the same documents over and over again, generating waste of resources.

Note that I didn't send any piece of code here, I'm talking about the general workflow.

I've found a way to do the ingestion phase and the QA phase separately in langchain. I'm now migrating this method to Llamaindex, which I think will be more optimal because I really liked what llamaparse does for better parsing documents. The ingestion is done separately from the QA and all of them are asynchronous processes.

Is this process presented by most documentations really wrong or am I missing something? I'm still studying the documentations but slowly because this is not my priority right now.
L
A
22 comments
Because when the application is off, I'd have to generate the embeddings of the same documents over and over again, -- why?

You can generate embeddings for your text chunks, save them in some vector db, and off you go
curious what the confusion is πŸ˜…
Once its saved in the vector db, you can just use that same db again without recalculating anything
But when I create an index, for example, with the from_documents method, I'd have to generate the embeddings again or check, in the next call, if it wasn't already ingested
I didn't got the idea right, I guess
I'll show how I'm doing
This is new to me
How I'm ingesting data:
get_embedded_nodes method:
This way I can separate ingestion from querying and not creating a vector index with a from_documents method. I can even do the question and answer stuff with langchain, if I specify the same collection and make the query embedding with the same embedding model
Yea its fine to do it this way if you want. If you already populated the vector store, you can just do index = VectorStoreIndex.from_vector_store(vector_store)
from_documents() does all the work you are doing above.

If you want, you can also use an ingestion pipeline with a vector store and docstore attached, to perform some document management (if a document with the same ID is ingested, its checked if the content is the same and upserted or skipped)
I see. Didn't know that
One thing I'm missing here is where chunking goes
I already have my nodes and embedded them here, but didn't do anything related to chunking, I guess
I'm using llamaparse
Add a reply
Sign up and join the conversation on Discord