Log in
Log into community
Find answers from the community
View all posts
Related posts
Did this answer your question?
๐
๐
๐
Powered by
Hall
Inactive
Updated 2 months ago
0
Follow
Is sentence splitter still optimal for
Is sentence splitter still optimal for
Inactive
0
Follow
d
dean
6 months ago
ยท
Is sentence splitter still optimal for embedding models like bge-m3 that can vectorize a whole article or paragraph?
L
d
8 comments
Share
Open in Discord
L
Logan M
6 months ago
the sentence splitter isn't splitting into single sentences, its splitting into chunks that respect sentence boundaries
d
dean
6 months ago
Ok but how does it factor in things such as titles, subsections and paragraphs under subsections? <h1> vs <h2> etc
d
dean
6 months ago
I also want to respect subsection boundaries!
d
dean
6 months ago
I also need to know the maximum size of an m3 chunk in terms of ASCII characters
L
Logan M
6 months ago
section boundaries are harder -- those should probably be split before applying a the sentence splitter, using your own algorithm
d
dean
6 months ago
Even if I am using a bge-m3 embedding model?
d
dean
6 months ago
What is maximum size of chunk I can use with bge-m3?
L
Logan M
6 months ago
bge-m3 has an 8k context limit
Add a reply
Sign up and join the conversation on Discord
Join on Discord