Find answers from the community

Updated 3 months ago

Hey everyone, I have a general question:

Hey everyone, I have a general question:
I'd like to finetune a model on several data sources of data. I'm using the llama-index loaders to load data from multiple data sources. What is the best way to save this data and fine tune an LLM on it? Should I build a vector DB without embeddings and once training is started - fetch the data from the DB? Or should I just save the data to GCS/S3 and load it from there? If the second option is the "correct" one, is there a built-in way to do that with llama-index?
W
d
5 comments
I think if you are finetuning it locally then putting it with you would be the fastest.
If you are going to finetune on a diff server then you can put the training data in a DB and then fetch it ovber there for finetuning
I intend on using a diff server. I was thinking of maybe creating an index without embeddings and simply retrieve the data from it. Is there a way to simply save documents to disk without creating an index?
You can use docstore I guess, and add data to it and then persist as well without creating the index
But this require you to convert your data into nodes
Docstore Sounds like a good idea. I'll try to check it out. Thanks!
Add a reply
Sign up and join the conversation on Discord