----------------------
I'm sorry to hear that you're experiencing issues with the sentence splitter in LlamaIndex. Based on the error message you provided, it seems like you're encountering a
RecursionError
. This typically happens when a function calls itself (either directly or indirectly) too many times, exceeding Python's maximum recursion depth.
However, without more specific information about your implementation and the documents you're trying to parse, it's difficult to pinpoint the exact cause of the error. The error could be due to a variety of reasons such as the structure of your documents, the configuration of your node parser, or the specific text splitter you're using.
In the LlamaIndex documentation, it's mentioned that you can customize the
text_splitter
used by the node parser. The default is
TokenTextSplitter
, but you can also use
SentenceSplitter
,
TokenTextSplitter
, or
CodeSplitter
. Each of these splitters has different configuration options, including the default separator and additional config options. These might be useful if the language of your documents is significantly different from English. Here is an example of how to customize the
SentenceSplitter
:
import tiktoken
from llama_index.text_splitter import SentenceSplitter
text_splitter = SentenceSplitter(
separator=" ",
chunk_size=1024,
chunk_overlap=20,
paragraph_separator="\n\n\n",
secondary_chunking_regex="[^,.;γ]+[,.;γ]?",
tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)
node_parser = SimpleNodeParser.from_defaults(text_splitter=text_splitter)
(Source:
LlamaIndex Documentation ())