Find answers from the community

Updated last year

hey is there any additional information

At a glance

The community members are discussing the MetadataAwareTextSplitter class from the LlamaIndex library. It is a base class that is meant to be extended, with the SentenceSplitter and TokenTextSplitter being subclasses. The purpose of this class is to split text while considering the metadata associated with it, as the metadata needs to be included when sending text to the language model. The SentenceSplitter is used in the IngestionPipeline, and the community members discuss how the "would-be" length of the metadata is included when splitting the initial text, so that the length of the chunk plus the length of the metadata does not exceed the chunk size when sending to the language model.

Useful resources
hey is there any additional information on https://docs.llamaindex.ai/en/stable/api/llama_index.node_parser.MetadataAwareTextSplitter.html ? What is it indended for? How does it work?
L
n
7 comments
It's a base class that is meant to be extended. The SentenceSplitter and TokenTextSplitter are both subclasses of this
Since metadata is included when sending text to the LLM, the text needs to be split with that metadata considered
That class makes it a little easier when implementing new text splitters
I am wondering in particular about SentenceSplitter.. is it used when sending text to the LLM?
I only know it from using it in IngestionPipeline
ah ok now I get. When the initial text is being split the "would-be" length of the metadata is included. So when sending to LLM in response synthesizer, the len(chunk) + len(metadata) <= chunk_size
Add a reply
Sign up and join the conversation on Discord