The community member who posted the original post is looking for best practices for choosing the appropriate node length when working with text data. They express concern that a larger window may add too much noise, while a smaller window might lose context information.
In the comments, other community members provide the following advice:
- Try to group things into logical groups, and use clear sections as document splits.
- Overlapping chunks can also work okay.
- The community member suggests using Langchain's text splitter to build nodes and Llama index, as they offer different options.
There is no explicitly marked answer in the comments.
awesome! I was wondering if there are Any best practices for choosing the appropriate node length? IMHO a larger window may adds too much noise yet a small window might lose context info/