Find answers from the community

Updated 6 months ago

is there a way to define the size of

At a glance

The community member asked if there is a way to define the size of nodes that documents will be chunked into. Another community member provided a code example showing how to set the chunk size and overlap using the SimpleNodeParser from the llama_index library. They mentioned that to change the size, the user can adjust the chunk_size parameter. The second community member simply replied "Ah thank you", indicating that the provided information was helpful.

Useful resources
is there a way to define the size of nodes that documents will be chunked into?
W
w
2 comments
You can do something like this and for changing the size just change the value for chunk _size in nodeparser
Plain Text
from llama_index import SimpleDirectoryReader, VectorStoreIndex, ServiceContext
from llama_index.node_parser import SimpleNodeParser

documents = SimpleDirectoryReader("./data").load_data()

node_parser = SimpleNodeParser.from_defaults(chunk_size=1024, chunk_overlap=20)
service_context = ServiceContext.from_defaults(node_parser=node_parser)

index = VectorStoreIndex.from_documents(documents, service_context=service_context)


For more check here:
https://gpt-index.readthedocs.io/en/latest/core_modules/data_modules/node_parsers/usage_pattern.html
Ah thank you
Add a reply
Sign up and join the conversation on Discord