Find answers from the community

Updated 2 years ago

Text splitting

At a glance

The community member who posted the original post is looking for best practices for choosing the appropriate node length when working with text data. They express concern that a larger window may add too much noise, while a smaller window might lose context information.

In the comments, other community members provide the following advice:

- Try to group things into logical groups, and use clear sections as document splits.

- Overlapping chunks can also work okay.

- The community member suggests using Langchain's text splitter to build nodes and Llama index, as they offer different options.

There is no explicitly marked answer in the comments.

awesome! I was wondering if there are Any best practices for choosing the appropriate node length? IMHO a larger window may adds too much noise yet a small window might lose context info/
L
R
5 comments
My best advice is to try and group things into logical groups
Like, if you have clear sections that you can parse out, those make good document splits
Otherwise, overlapping chunks usually works ok-ish
I see. I guess I can use Langchain's text splitter to build nodes and use Llama index, since they offer a few different ones.
Yea for sure!
Add a reply
Sign up and join the conversation on Discord