Hi @Logan M, We did use backup separators in our text splitter which has reduced number of errors (Thanks it's a great feature to use with)
self.chunk_size = 200
self.chunk_overlap = 50
self.backup_separators = [".", ",", "!"]
# Text splitter
self.text_splitter = TokenTextSplitter(
chunk_size=self.chunk_size,
chunk_overlap=self.chunk_overlap,
backup_separators=self.backup_separators,
)
What we need here is Can TokenTextSplitter splitter forcefully chunk the text with token count? What else can we do to improve the chunking?
so that, we reduce the chunkoverlap and chunksize errors?
@jerryjliu0 Would it possible to have an extra parameter with text splitters which can supress the errors (In case anyone want to ignore the chunk errors)?