Find answers from the community

Updated last year

Splitting

At a glance

The community member is looking to create a custom sentence splitter for an ingestion pipeline, where they want to split text primarily based on a regex separator rather than chunk size. They ask if they can use the existing sentence splitter or if they need an alternate approach. A comment suggests that it may not be possible with the existing sentence splitter, and the community member would need to subclass and create their own custom splitter. The comment also mentions checking the existing sentence splitter implementation as an example to follow.

Useful resources
Question about customs transformations with ingestion pipelines. I am looking to create a custom sentence splitter type thing where i am splitting primary on a regex separator vs primarily on chunk size. Can i use sentence splitter or do I need an alternate? any examples of something similar?
L
1 comment
I don't think it's possible with the sentence splitter

You'd have to subclass and make your own splitter.

Check out the sentence splitter as an example to follow maybe

https://github.com/run-llama/llama_index/blob/a4184f47626c6957f40f5b2732de9344e26d2a01/llama-index-core/llama_index/core/node_parser/text/sentence.py#L65
Add a reply
Sign up and join the conversation on Discord