The community member is looking for a way to split text into chunks that stop at the end of a paragraph and do not overlap. They have tried using a sentence splitter with "\n\n\n" but it does not seem to work. The other option of having hundreds of separate text files is not preferred.
The comments suggest that the community member could write their own chunking logic and manually create nodes. Some community members provide code examples for this approach, which involves splitting the text by paragraphs and creating TextNode objects for each chunk.
There is a discussion around the compatibility of this approach with service context and the need for metadata. Some community members suggest that the metadata can be added manually if needed.
The community member also tried using the SentenceSplitter from the llama_index library, but the code they provided does not seem to work as expected. Another community member suggests trying to use the text_splitter.split_text(text) method directly to debug the issue.
Finally, a community member mentions that they have forked the repository and modified the text splitter to have the desired behavior, and they ask if this could be a useful feature for others. Another community member suggests that if the modification
Hello, Is there a way to determine when to stop a chunk ? I want to have chunks that stops at the end of a paragraph and does not overlap. I’ve tried with sentence splitter with \n\n\n but it does not seems to be doing it. The other option would be to have hundreds of separated little txt files but I’d rather not
@Logan M hello, I’ve forked the repo and modified the dented splitter class to have the behavior I want, which is to have chunks either of size x (256,512…) or have a chunk of size smaller than d that stops at a new paragraph. Do you think this is a useful feature for other person ? Like having a end_chunk_separator parameter or would this PR be useless ?