Find answers from the community

Updated last year

Splitters

I have exported .docx from google and I have this type of newlines, is this text splitter a good solution?
Plain Text
text_splitter = TokenTextSplitter(
  separator=" ",
  chunk_size=1024,
  chunk_overlap=20,
  backup_separators=["\n", "\n\n", "\n\n\n", "\n\n\n\n", "\n\n\n\n\n", "\n\n\n\n\n\n", "\n\n\n\n\n\n\n", "\n\n\n\n\n\n\n\n", "\n\n\n\n\n\n\n\n\n", "\n\n\n\n\n\n\n\n\n\n"]
)
Attachment
image.png
L
G
3 comments
I mean, that's probably fine? But also, you could pre-process the text to cut down on the number of newlines too πŸ˜…
thank you logan, your help is so much appreciated
:peepohugback:
Add a reply
Sign up and join the conversation on Discord