Find answers from the community

Updated 2 months ago

i have markdown files to be vectorized

i have markdown files to be vectorized current parser MarkdownReader is splitting the markdown based on headings eg (`#, code block) . I want to change the strategy of dividing the document chunk. As in my use case the document extracted doesn't have more context due to small chunks
L
p
3 comments
You can always use a normal text splitter, or your own parsing strategy
can u provide any reference
Plain Text
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=128)
nodes = splitter(documents)
Add a reply
Sign up and join the conversation on Discord