Find answers from the community

Updated 3 months ago

any suggestion to build multimodel rag

any suggestion to build multimodel rag with 1TB of Data of jsons, pdf, docs, excel, , png and jpgs?

W
B
22 comments
I think you will need to use a vector store cloud, self hosting vector store may become more costly as it will require machine with that much size.

Second open-source multimodal llms are not that good. You will have to check which suites your case.

I would suggest if the data is not confidential or private go with GPT-4o its good for multimodal stuff.
See I have server with self hosted Milvus Vector Store and with 4*A100 for local embedding model and Mistral
Right now i am running indexing with multiprocessing just to index json, pdf and docs data and use mistral to summary to get better responses
But this takes atleast 3 days to index, and no media stuff
Any Github Notebook you suggest to build better RAG ?
What is the chunk size that you are trying with. Also Embed model has embed_batch_size section which allows you to define the amount of data to go for embedding process at once.

Also I think embedding model chooses GPU if present by default but make sure it is using GPU only.
1TB data is huge, like real huge in terms of size. So it will take time for sure
4 * A100 man! have you tried llama3.1-70b?
GPU0 : bge-en-icl (TEI)
GPU1,2: for mistal to summaries chunks(TGI)
GPU3: Milvus-GPU vector store with CAGRA
But i did try llama3.1 70 B with 8-bit with run perfectly on 2 GPUs
Any suggestions for multimodel, any notebook?
yeah using Settings.embed_model = TextEmbeddingsInference(
base_url="http://10.10.10.50:8083",
model_name="BAAI/bge-en-icl", # required for formatting inference text,
timeout=60, # timeout in seconds
embed_batch_size=125, # batch size for embedding
)

with this running 228 processes
Not able to find one at the moment, Will share as soon as I get
Correct me if i'm wrong, You are summarising each of the chunks along with creating indexes?
Yes summarising each chunk
If llamaparse actually worth in my use case?
This is the culprit in your case
This is increasing your time
Each chunk is passed to create summary, Now considering you have 1TB data so I dont really wanna imagine how many chunks you have tbh πŸ˜†
But summarizing each chunk would be taking approx 4-5 secs in my guess. Once all the summarization is done, it goes for indexing.
Where summarized part is extra. So time increment here as well
hey, you up for voice channel?
If it is a one time process, I would suggest go for it. LlamaParse will extract details from your better than any platform available there but you'll only get document object from there. Indexing you'll have to do at your side.

But time can get minimised there if you try it.
Not at the moment πŸ™ We can do it some other time
cool, if you find some mulitmodel notebook that can help in my use case
that will be really helpful
thanks for all the advice!
Add a reply
Sign up and join the conversation on Discord