Find answers from the community

Updated 4 months ago

Now that the ServiceContext is

At a glance

The community member is asking how to specify the chunk size for testing the performance of different chunk sizes with multiple VectorStoreIndex, now that the ServiceContext is deprecated in favor of Settings. The comments provide two solutions:

1. Set the chunk size globally using globalSettings.chunk_size = 512.

2. Specify the node parser/text splitter and pass it into the index, using SentenceSplitter(chunk_size=512).

The community members also note that the index takes a transformations array, similar to the ingestion pipeline.

Now that the ServiceContext is deprecated in favor of Settings, I want to test the performance of different chunk sizes with multiple VectorStoreIndex, where do I have to specify now the chunk_size for this to work?
L
s
3 comments
You can specifuy the chunk size as a global

Plain Text
Settings.chunk_size = 512


Or specify the node parser/text splitter and pass that into the index

Plain Text
from llama_index.core.node_parser import SentenceSplitter

splitter = SentenceSplitter(chunk_size=512)
index = VectorStoreIndex.from_documents(..., transformations=[splitter])
You'll notice that the index takes a transformations array, similar to the ingestion pipeline
Add a reply
Sign up and join the conversation on Discord