Find answers from the community

Updated 5 months ago

i have markdown files to be vectorized

At a glance

i have markdown files to be vectorized current parser MarkdownReader is splitting the markdown based on headings eg (`#, code block) . I want to change the strategy of dividing the document chunk. As in my use case the document extracted doesn't have more context due to small chunks

3 comments

LLogan M

You can always use a normal text splitter, or your own parsing strategy

ppayload

can u provide any reference

LLogan M

Plain Text

from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=128)
nodes = splitter(documents)

Add a reply