Text splitting

At a glance

The community member who posted the original post is looking for best practices for choosing the appropriate node length when working with text data. They express concern that a larger window may add too much noise, while a smaller window might lose context information.

In the comments, other community members provide the following advice:

- Try to group things into logical groups, and use clear sections as document splits.

- Overlapping chunks can also work okay.

- The community member suggests using Langchain's text splitter to build nodes and Llama index, as they offer different options.

There is no explicitly marked answer in the comments.

RRay Li

awesome! I was wondering if there are Any best practices for choosing the appropriate node length? IMHO a larger window may adds too much noise yet a small window might lose context info/

5 comments

LLogan M

My best advice is to try and group things into logical groups

LLogan M

Like, if you have clear sections that you can parse out, those make good document splits

LLogan M

Otherwise, overlapping chunks usually works ok-ish

RRay Li

I see. I guess I can use Langchain's text splitter to build nodes and use Llama index, since they offer a few different ones.

LLogan M

Yea for sure!

Add a reply

Find answers from the community

Text splitting