Find answers from the community

Updated 2 years ago

Splitters

At a glance
The community member has exported a .docx file from Google and is concerned about the type of newlines in the text. They have provided a text_splitter configuration and are asking if it is a good solution. In the comments, another community member suggests that the community member could pre-process the text to reduce the number of newlines. The other comments express appreciation for the help.
I have exported .docx from google and I have this type of newlines, is this text splitter a good solution?
Plain Text
text_splitter = TokenTextSplitter(
  separator=" ",
  chunk_size=1024,
  chunk_overlap=20,
  backup_separators=["\n", "\n\n", "\n\n\n", "\n\n\n\n", "\n\n\n\n\n", "\n\n\n\n\n\n", "\n\n\n\n\n\n\n", "\n\n\n\n\n\n\n\n", "\n\n\n\n\n\n\n\n\n", "\n\n\n\n\n\n\n\n\n\n"]
)
Attachment
image.png
L
G
3 comments
I mean, that's probably fine? But also, you could pre-process the text to cut down on the number of newlines too πŸ˜…
thank you logan, your help is so much appreciated
:peepohugback:
Add a reply
Sign up and join the conversation on Discord