Find answers from the community

Updated 3 months ago

Hey how do I set a custom doc id when

Hey, how do I set a custom doc_id when loading documents from different data loaders like youtube_transcript, s3 etc ?
L
m
4 comments
you can set them after they've been loaded

Plain Text
documents = ....
documents[0].doc_id = "my_doc_id"
Thanks. would the returned documents have metadata with filename, so I can set the doc id according to the filename? Let's say I load in data from YouTube urls [url1, url2] would the documents returned be in the same order so I can set the first doc id as "url1"? Same with s3 when I load in documents from a folder
Depends on the implementation of the data loader from the community tbh lol

I would read the source code for the respective loader to see what's going on under the hood

The YouTube loader you would have to manually set the doc id without metadata, for example
https://github.com/emptycrown/llama-hub/blob/a109e482407586e98b731bf557700b4cc4fc706a/llama_hub/youtube_transcript/base.py#L29

Would be easy PRs to make sure the Metadata is set for the loaders you use πŸ‘Œ
Will do that. Thanks!
Add a reply
Sign up and join the conversation on Discord