The community member is asking if there are any other text preprocessors apart from textsplitter and recursivetextsplitter, as they have found that sometimes one works better than the other for different documents. Another community member responds that any text splitter available in Langchain can be used, and suggests not worrying too much about how the text is being split, as there is typically a large amount of text overlap when ingesting documents, so the splits may not matter as much.
do we have any other text preprocessor apart from textsplitter and recursivetextsplitter? for one document textsplitter works good and sometime on other recuresivetextsplitter works good, sometimes neither of them. cc: , ,
Tbh I wouldn't worry too much about how the text is being split. Since there's a large amount of text overlap when injesting documents (default 200 tokens), the splits end up not mattering as much