any suggestion to build multimodel rag

At a glance

any suggestion to build multimodel rag with 1TB of Data of jsons, pdf, docs, excel, , png and jpgs?

22 comments

I think you will need to use a vector store cloud, self hosting vector store may become more costly as it will require machine with that much size.

Second open-source multimodal llms are not that good. You will have to check which suites your case.

I would suggest if the data is not confidential or private go with GPT-4o its good for multimodal stuff.

BBhavya Giri

See I have server with self hosted Milvus Vector Store and with 4*A100 for local embedding model and Mistral
Right now i am running indexing with multiprocessing just to index json, pdf and docs data and use mistral to summary to get better responses
But this takes atleast 3 days to index, and no media stuff
Any Github Notebook you suggest to build better RAG ?

BBhavya Giri

@WhiteFang_Jr

WWhiteFang_Jr

What is the chunk size that you are trying with. Also Embed model has embed_batch_size section which allows you to define the amount of data to go for embedding process at once.

Also I think embedding model chooses GPU if present by default but make sure it is using GPU only.
1TB data is huge, like real huge in terms of size. So it will take time for sure

WWhiteFang_Jr

4 * A100 man! have you tried llama3.1-70b?

BBhavya Giri

GPU0 : bge-en-icl (TEI)
GPU1,2: for mistal to summaries chunks(TGI)
GPU3: Milvus-GPU vector store with CAGRA

BBhavya Giri

But i did try llama3.1 70 B with 8-bit with run perfectly on 2 GPUs

BBhavya Giri

Any suggestions for multimodel, any notebook?

BBhavya Giri

yeah using Settings.embed_model = TextEmbeddingsInference(
base_url="http://10.10.10.50:8083",
model_name="BAAI/bge-en-icl", # required for formatting inference text,
timeout=60, # timeout in seconds
embed_batch_size=125, # batch size for embedding
)

with this running 228 processes

WWhiteFang_Jr

Not able to find one at the moment, Will share as soon as I get

WWhiteFang_Jr

Correct me if i'm wrong, You are summarising each of the chunks along with creating indexes?

BBhavya Giri

Yes summarising each chunk

BBhavya Giri

If llamaparse actually worth in my use case?

WWhiteFang_Jr

This is the culprit in your case

WWhiteFang_Jr

This is increasing your time

WWhiteFang_Jr

Each chunk is passed to create summary, Now considering you have 1TB data so I dont really wanna imagine how many chunks you have tbh 😆
But summarizing each chunk would be taking approx 4-5 secs in my guess. Once all the summarization is done, it goes for indexing.
Where summarized part is extra. So time increment here as well

BBhavya Giri

6.8 mil

BBhavya Giri

hey, you up for voice channel?

WWhiteFang_Jr

If it is a one time process, I would suggest go for it. LlamaParse will extract details from your better than any platform available there but you'll only get document object from there. Indexing you'll have to do at your side.

But time can get minimised there if you try it.

WWhiteFang_Jr

Not at the moment 🙏 We can do it some other time

BBhavya Giri

cool, if you find some mulitmodel notebook that can help in my use case
that will be really helpful

BBhavya Giri

thanks for all the advice!

Add a reply

Find answers from the community

any suggestion to build multimodel rag