Find answers from the community

Home
Members
Chris S.
C
Chris S.
Offline, last seen 4 months ago
Joined September 25, 2024
Why is metadata length tied to chunk size? I would expect chunk size to apply only to the text chunk itself.
ValueError: Metadata length (407) is longer than chunk size (128). Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
12 comments
M
C
L
I'm trying to better understand the SentenceSplitter class. When setting the chunk size, it appears that arg is referring to the number of max tokens per chunk. What isn't so clear is what tokenizer is being used under the hood? I tried passing in a HF tokenizer to the tokenizer arg, but the output from doing so simply returned the text input without chunking it all. Simply using the SentenceSplitter as is, without passing in any tokenizer works as expected.
8 comments
C
L