im having a very strange issue with the sentenceSplitter
im having a very strange issue with the sentenceSplitter
At a glance
A community member is experiencing an issue with the SentenceSplitter node parser in their application. When they set a positive value for the chunk_overlap, they receive an error stating that the chunk_overlap size is greater than the node_chunk_size, even though this is not the case. The community member provided an example where a chunk_overlap of 8 is considered larger than a chunk_size of 160.
In the comments, another community member suggests that the issue may be caused by the values being converted to strings when retrieved from the environment variables using os.getenv(). They mention that they encountered a similar issue with the HierarchicalNodeParser and resolved it by removing the default values from os.getenv() and wrapping the chunk_size and chunk_overlap variables in int().
There is no explicitly marked answer, but the community members are collaborating to understand and resolve the issue.
im having a very strange issue with the SentanceSplitter node parser. When i use a node_chunk_overlap of size 0 i have no issues, but if i use a positive value i always get an error that the chunk_overlap size is greater than the node_chunk_size, when it definitely is not larger. for example, a node_chunk_overlap of size 8 is considered larger than a node_chunk_size of 160. as shown here:
2024-07-04 10:11:54 Traceback (most recent call last): 2024-07-04 10:11:54 File "/app/main.py", line 27, in <module> 2024-07-04 10:11:54 init_settings() 2024-07-04 10:11:54 File "/app/app/settings.py", line 41, in init_settings 2024-07-04 10:11:54 Settings.node_parser = SentenceSplitter(chunk_size=Settings.chunk_size, chunk_overlap=Settings.chunk_overlap) 2024-07-04 10:11:54 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-07-04 10:11:54 File "/usr/local/lib/python3.11/site-packages/llama_index/core/node_parser/text/sentence.py", line 81, in init 2024-07-04 10:11:54 raise ValueError( 2024-07-04 10:11:54 ValueError: Got a larger chunk overlap (8) than chunk size (160), should be smaller.
i dont understand how it thinks 8 is larger than 160???
i think somehow the value was being converted into a string when its taken in with os.getenv(). it happened the same way with the HierarchicalNodeParser, so I removed the default values from os.getenv(), wrapped all the chunk_size and chunk_overlap variables in int() and it seems to be working now.