Find answers from the community

Updated last year

hey is there any additional information

hey is there any additional information on https://docs.llamaindex.ai/en/stable/api/llama_index.node_parser.MetadataAwareTextSplitter.html ? What is it indended for? How does it work?
L
n
7 comments
It's a base class that is meant to be extended. The SentenceSplitter and TokenTextSplitter are both subclasses of this
Since metadata is included when sending text to the LLM, the text needs to be split with that metadata considered
That class makes it a little easier when implementing new text splitters
I am wondering in particular about SentenceSplitter.. is it used when sending text to the LLM?
I only know it from using it in IngestionPipeline
ah ok now I get. When the initial text is being split the "would-be" length of the metadata is included. So when sending to LLM in response synthesizer, the len(chunk) + len(metadata) <= chunk_size
Add a reply
Sign up and join the conversation on Discord