Find answers from the community

Updated last month

Storage

At a glance

For those who are creating datasets for evaluating their LLMs, where are you storing typically those datasets?

6 comments

S3, git lfs, huggingface datasets

Could also be any sql-ish db, or nosql, depending on what's in the dataset

If s3, or lfs, are you storing as json typically?

yea itd just be a json blob. You could compress it if its really huge, but JSON is nice for less complexity

(plus then you can just dump/load a pydantic model for example)

Okay thanks. For my use-case, the datasets are smaller so I'll probably stick to JSON + git for now until I need something more robust.

Add a reply