Find answers from the community

Updated 3 months ago

Setting doc id

How would I set the doc_id if I'm loading in multiple files at once?
L
p
15 comments
Either iterate over all of them (kinda lame)

Or use this example that shows how to set it in the extra info field automatically

Plain Text
filename_fn = lambda filename: {'file_name': filename}
documents = SimpleDirectoryReader("./data", file_metadata=filename_fn)
That makes sense, and how would I return the doc_id in my response for each finding?
Is that something I place inside the prompt or outside?
I put it inside my prompt and it workedπŸ™Œ
Huh, I'm surprised it worked in the prompt πŸ˜…

You can also check something like response.source_nodes[0].node.node_info for the exact info
Interesting when I do response.source_nodes[0].node.node_info it returns "None"
Also if I want to pass in more data like the "page number" and other types of identifiers, can I still use file_metadata?
Interesting, maybe I'm mistaken then haha
Yea you can still use it, but all it has access to is the filename string πŸ˜… you could use an actual function instead of a lambda to do something more complicated
Sorry I meant can I add my own custom identifier like:

Plain Text
filename_fn = lambda filename: {
'file_name': filename,
'my_id': "12345"
}
I basically want to use parameters that are being passed in through my function and add them as identifiers
Yea you could do that! But then the id will be the same for every file πŸ˜…
Personally, I would parse all the documents myself and manually create the Document objects with the proper metadata I want, rather than using the directory loader πŸ˜… but that's only easy if you are dealing with a single and consistent file format
I did this which worked:

file_doc = Document(data, extra_info={'file_name': 'transcript-WITHSpeakerType.json', 'project_id': 'test', 'another_id': 123})

However, how would I dynamically get the file_name, or would I just build the string myself and pass that in?
I think you would iterate over your json files, and build that dict yourself for each filename 🀞
Add a reply
Sign up and join the conversation on Discord