Find answers from the community

Updated 3 weeks ago

Storage

For those who are creating datasets for evaluating their LLMs, where are you storing typically those datasets?
L
R
6 comments
S3, git lfs, huggingface datasets
Could also be any sql-ish db, or nosql, depending on what's in the dataset
If s3, or lfs, are you storing as json typically?
yea itd just be a json blob. You could compress it if its really huge, but JSON is nice for less complexity
(plus then you can just dump/load a pydantic model for example)
Okay thanks. For my use-case, the datasets are smaller so I'll probably stick to JSON + git for now until I need something more robust.
Add a reply
Sign up and join the conversation on Discord